Skip to content

Fix spurious resource replacements during API version upgrades with refresh#4429

Merged
EronWright merged 7 commits into
masterfrom
eronwright/fix-api-version-upgrade-refresh
Nov 25, 2025
Merged

Fix spurious resource replacements during API version upgrades with refresh#4429
EronWright merged 7 commits into
masterfrom
eronwright/fix-api-version-upgrade-refresh

Conversation

@EronWright

@EronWright EronWright commented Nov 19, 2025

Copy link
Copy Markdown
Contributor

Problem

When upgrading resources to a new API version using aliases and running pulumi up --refresh, the provider incorrectly reported spurious property changes and marked resources for replacement, even though the actual Azure resource hadn't changed.

Example scenario:

  • User has a ManagedCluster in state using API version 2024-01-02-preview
  • User upgrades provider to v3 which defaults to 2024-10-02-preview
  • User runs pulumi up --refresh
  • Provider incorrectly shows properties being added and marks resource for replacement

Reported in: #4400

Root Cause

The provider uses "input-input diffing" to detect changes:

  1. Convert old state outputs → inputs (using schema)
  2. Convert new Azure outputs → inputs (using schema)
  3. Diff these to find actual changes

The bug: When API version changed via alias, the provider used the NEW schema to normalize BOTH old and new state. This caused properties that only exist in the new schema to appear as "additions".

Concrete Example

Old state (deployed with v2.90.0 using API 2024-01-02-preview):

{
  "dnsPrefix": "test-aks-4400",
  "location": "eastus",
  "agentPoolProfiles": [...],
  "azureApiVersion": "2024-01-02-preview"
}

New Azure response (read with API 2024-10-02-preview):

{
  "dnsPrefix": "test-aks-4400",
  "location": "eastus",
  "agentPoolProfiles": [...],
  "kind": "Base",                           // New property in 2024-10-02-preview
  "networkProfile": {
    "podLinkLocalAccess": "IMDS"           // New property in 2024-10-02-preview
  },
  "azureApiVersion": "2024-10-02-preview"
}

Before fix: Both states normalized with 2024-10-02-preview schema

  • Old state projection: {dnsPrefix, location, agentPoolProfiles} (missing new properties)
  • New state projection: {dnsPrefix, location, agentPoolProfiles, kind, networkProfile}
  • Diff: Shows kind and networkProfile.podLinkLocalAccess being added → triggers replacement ❌

After fix: Schema-aware normalization

  • Old state projection with 2024-01-02-preview schema: {dnsPrefix, location, agentPoolProfiles}
  • New state projection with 2024-10-02-preview schema: {dnsPrefix, location, agentPoolProfiles} (new properties dropped as defaults)
  • Diff: No changes

Solution

Modified the Read method in provider.go to detect API version changes and use schema-aware normalization:

  1. Detect API version change: Compare azureApiVersion from state vs current metadata
  2. Look up old metadata: Find resource metadata for the old API version
  3. Use correct schemas:
    • Old state → old schema → inputs projection
    • New state → new schema → inputs projection
  4. Diff the correct projections: Now comparing apples-to-apples

Key Code Changes

API Version Detection (lines 1376-1392):

// Check if API version has changed (e.g., via alias during upgrade).
var oldApiVersion string
var oldRes *resources.AzureAPIResource
if azureApiVersion, ok := oldState["azureApiVersion"]; ok && azureApiVersion.IsString() {
    oldApiVersion = azureApiVersion.StringValue()
    if oldApiVersion != res.APIVersion {
        // API version has changed. Try to look up the old resource metadata.
        logging.V(5).Infof("%s: API version changed from %s to %s", label, oldApiVersion, res.APIVersion)
        oldRes, err = k.lookupResourceWithAPIVersion(urn, oldApiVersion)
        if err != nil {
            logging.V(5).Infof("%s: could not look up old resource metadata for API version %s: %v", label, oldApiVersion, err)
            oldRes = nil
        }
    }
}

Schema-Aware Normalization (lines 1460-1477):

// Use old resource metadata if API version changed
resForDefaults := res
if oldRes != nil {
    resForDefaults = oldRes
}
removeDefaults(*resForDefaults, plainOldState, previousInputs.Mappable())

// Project old outputs using old schema if API version changed
var oldInputProjection map[string]interface{}
if oldRes != nil {
    // API version changed: use old schema for converting old state
    oldInputProjection = k.converter.SdkOutputsToSdkInputs(oldRes.PutParameters, plainOldState)
} else {
    // Same API version: use current schema
    oldInputProjection = k.converter.SdkOutputsToSdkInputs(res.PutParameters, plainOldState)
}

Testing

Unit tests:

  • Added TestReadWithApiVersionMismatch for Read() with API version mismatch
  • Validates old schema lookup via lookupResourceWithAPIVersion()
  • Confirms azureApiVersion output update after Read
  • Added TestApiVersionToVersionPart for API version conversion utilities

E2E regression test:

  • Added TestUpgradeAksApiVersion_2_90_0 replay-based test
  • Validates v2.90.0 → v3 upgrade scenario (versioned → unversioned types)
  • Uses recorded GRPC interactions from real v2.90.0 ManagedCluster deployment
  • Asserts no replacements during provider upgrade preview
  • Comprehensive documentation in test-programs/upgrade-aks-api-version/README.md

Manual testing:

  • Deployed ManagedCluster with v2.90.0 using API version 2024-01-02-preview
  • Upgraded to v3 provider (defaults to 2024-10-02-preview)
  • Ran pulumi up --refresh - no spurious replacements
  • Verified azureApiVersion updated correctly
  • Properties like kind and networkProfile.podLinkLocalAccess handled correctly as defaults

Additional Enhancements

User-facing warning for API version changes:

  • Added warning in Diff() when API version mismatch is detected between state and provider metadata
  • Suggests running pulumi refresh to align schemas and update resource state
  • Helps users understand when refresh operations are beneficial after provider upgrades

Code simplifications:

  • Leveraged existing openapi.ApiToSdkVersion() and openapi.SdkToApiVersion() utilities for format conversion
  • Used resources.ParseToken() and resources.BuildToken() helpers for type token manipulation
  • Improved error messages to distinguish between missing resources vs unavailable API versions

Files Changed

  • provider/pkg/provider/provider.go - Core fix, helper methods, and user warnings
  • provider/pkg/provider/provider_test.go - Unit test for Read() with API version mismatch
  • provider/pkg/provider/resource_lookup_test.go - Unit tests for API version conversion
  • provider/pkg/provider/provider_e2e_test.go - E2E regression test infrastructure
  • provider/pkg/provider/test-programs/upgrade-aks-api-version/ - E2E test program and recordings
  • CLAUDE.md - Documentation on testing API version upgrades

Fixes #4400

@github-actions

Copy link
Copy Markdown
Contributor

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

@codecov

codecov Bot commented Nov 19, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 56.36364% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.47%. Comparing base (39412ea) to head (53c9387).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
provider/pkg/provider/provider.go 56.36% 12 Missing and 12 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4429      +/-   ##
==========================================
+ Coverage   59.44%   59.47%   +0.03%     
==========================================
  Files          91       91              
  Lines       11480    11531      +51     
==========================================
+ Hits         6824     6858      +34     
- Misses       4020     4024       +4     
- Partials      636      649      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@EronWright

Copy link
Copy Markdown
Contributor Author

✅ Fix Verified - Spurious Replacement Eliminated

Successfully tested the improved fix (commit 87fd4f5) using the blind variant reproduction scenario. The fix eliminates the spurious replacement bug reported in #4400.

Test Setup

Starting State (Critical Mixed Condition):

  • Type token: azure-native:containerservice/v20241002preview:ManagedCluster (new)
  • azureApiVersion in state: 2024-01-02-preview (old)
  • This mixed state was created via "blind update" (pulumi up without --refresh after API version upgrade)

Provider Version: Built from eronwright/fix-api-version-upgrade-refresh branch with simplified fix

Test Results

Original Bug Behavior (v2.90.0)

 ~  azure-native:containerservice/v20241002preview:ManagedCluster test-cluster refresh [diff: +kind,networkProfile~agentPoolProfiles]
 ++ azure-native:containerservice/v20241002preview:ManagedCluster test-cluster create replacement
 +- azure-native:containerservice/v20241002preview:ManagedCluster test-cluster replace
 -- azure-native:containerservice/v20241002preview:ManagedCluster test-cluster delete original
Resources:
    +-1 to replace

Problem: Provider read resource using NEW API version (2024-10-02-preview) and compared it with state containing OLD API version outputs, causing false positive property differences.

Fixed Behavior (This PR)

 ~  azure-native:resources:ResourceGroup aks-test-rg refresh 
 ~  azure-native:containerservice/v20241002preview:ManagedCluster test-cluster refresh 
Resources:
    3 unchanged

Solution: Provider now correctly looks up resource metadata using the OLD API version (2024-01-02-preview) from state's azureApiVersion field when performing refresh operations. This ensures schema consistency during cross-API-version comparisons.

State Evolution

Before Refresh:

{
  "type": "azure-native:containerservice/v20241002preview:ManagedCluster",
  "azureApiVersion": "2024-01-02-preview"
}

After pulumi up --refresh:

{
  "type": "azure-native:containerservice/v20241002preview:ManagedCluster",
  "azureApiVersion": "2024-10-02-preview"
}

The API version was seamlessly upgraded during refresh without triggering any spurious replacements. ✨

Key Improvements in This Revision

  1. Simplified implementation: Uses existing openapi.ApiToSdkVersion() conversion function instead of custom parsing logic (~45 lines of code eliminated)
  2. Added ARM path validation: Verifies resource identity by checking that the looked-up resource has the same ARM path
  3. Better maintainability: Leverages well-tested conversion infrastructure that handles all API version format edge cases

Test Logs

  • Baseline bug reproduction: /tmp/preview-refresh-blind-variant.log
  • Fixed behavior: /tmp/preview-refresh-FIXED-no-run-program.log
  • Final verification: /tmp/up-refresh-FIXED.log

The fix is working as intended and ready for review!

EronWright added a commit that referenced this pull request Nov 21, 2025
Adds replay test to prevent regression of spurious replacements when
upgrading from v2.90.0 (versioned containerservice/v20240102preview)
to v3 (unversioned containerservice with default API version).

Test validates fix in lookupResourceWithAPIVersion() that ensures
provider uses OLD API version from state during refresh, not NEW
API version from provider metadata.

Key aspects:
- Two-program structure: v2 uses versioned type, v3 uses unversioned
- No explicit alias needed: Pulumi automatically aliases compatible types
- Recorded GRPC interactions enable fast CI replay without Azure credentials
- Validates the "mixed state" condition where azureApiVersion differs
  from provider metadata default

Includes:
- TestUpgradeAksApiVersion_2_90_0 test function
- test-programs/upgrade-aks-api-version/ (v2 program)
- test-programs/upgrade-aks-api-version/v3/ (v3 program)
- Recorded gRPC interactions with v2.90.0 for replay
- Comprehensive README documenting test strategy

Related to #4400, PR #4429
@EronWright EronWright marked this pull request as ready for review November 21, 2025 18:15
Comment thread FIX_SUMMARY_4400.md Outdated
EronWright and others added 7 commits November 24, 2025 14:40
Fixes #4400

When upgrading a resource to a new API version (e.g., ManagedCluster from
v20240102preview to v20241002preview), `pulumi up --refresh` incorrectly
triggered resource replacement even though no actual changes were made.

Root Cause:
-----------
During `pulumi up --refresh` with an API version change:
1. The resource URN type updates to the new API version (via alias)
2. The Read method looks up metadata using the NEW URN type
3. When normalizing old state to inputs, it used the NEW schema for BOTH
   old state (from old API version) and new state (from Azure)
4. This schema mismatch caused properties unique to the new API version
   to appear as "additions", triggering spurious diffs and replacements

The Fix:
--------
The Read method now detects API version changes by comparing the
azureApiVersion in state with the current resource metadata version.
When a change is detected:

1. Look up the old API version's resource metadata
2. Use the OLD schema when normalizing old state to inputs
3. Use the NEW schema when normalizing new state to inputs
4. Calculate diff between properly schema-aligned projections
5. Fall back to preserving old inputs if old metadata unavailable

This ensures the provider's "input-input diffing" architecture correctly
handles cross-version state migration during refresh operations.

Implementation Details:
----------------------
- Added lookupResourceWithAPIVersion() to fetch metadata for specific API versions
- Added apiVersionToVersionPart() to convert API version formats
- Modified Read() to perform schema-aware state normalization
- Added unit tests for API version format conversion
- All existing provider tests pass (no regressions)

The fix is backward compatible and only affects behavior during API
version upgrades via refresh operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Improved the lookupResourceWithAPIVersion implementation by:
- Using openapi.ApiToSdkVersion() instead of custom parsing logic
- Adding ARM path validation to verify resource identity
- Removing apiVersionToVersionPart() helper and its tests

This reduces code complexity while improving robustness by leveraging
well-tested conversion functions that handle all API version formats.

Related to #4400
Adds replay test to prevent regression of spurious replacements when
upgrading from v2.90.0 (versioned containerservice/v20240102preview)
to v3 (unversioned containerservice with default API version).

Test validates fix in lookupResourceWithAPIVersion() that ensures
provider uses OLD API version from state during refresh, not NEW
API version from provider metadata.

Key aspects:
- Two-program structure: v2 uses versioned type, v3 uses unversioned
- No explicit alias needed: Pulumi automatically aliases compatible types
- Recorded GRPC interactions enable fast CI replay without Azure credentials
- Validates the "mixed state" condition where azureApiVersion differs
  from provider metadata default

Includes:
- TestUpgradeAksApiVersion_2_90_0 test function
- test-programs/upgrade-aks-api-version/ (v2 program)
- test-programs/upgrade-aks-api-version/v3/ (v3 program)
- Recorded gRPC interactions with v2.90.0 for replay
- Comprehensive README documenting test strategy

Related to #4400, PR #4429
Replace manual type token parsing with well-tested utilities from the
resources and openapi packages:

- Use resources.ParseToken() instead of manual string splitting
- Use openapi.ApiToSdkVersion() for canonical version conversion
- Use resources.BuildToken() to construct candidate type tokens
- Improve error messages to distinguish missing resources from
  unavailable API versions

This makes the code more maintainable and ensures consistent handling
of both versioned (azure-native:compute/v20210301:VM) and unversioned
(azure-native:compute:VM) type tokens across v2 and v3 providers.

Related to #4400
When the Diff method detects that the API version in state differs from
the current provider metadata, emit a warning suggesting the user run
`pulumi refresh` to update the resource state with the new API version
schema.

This helps users understand that running refresh will provide better
schema alignment and avoid potential spurious diffs when API versions
change (e.g., during provider upgrades).

The warning:
- Only appears when there's an actual API version mismatch
- Uses the same detection pattern as the Read method fix for issue #4400
- Handles edge cases gracefully (missing azureApiVersion, invalid types)
- Provides clear, actionable guidance

Related to #4400

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Tests the fix for issue #4400 where API version changes during provider
upgrade caused spurious replacements during refresh.

The test validates that:
- Read() succeeds when state has old API version but provider has new
- Old schema is used for normalizing old state (via lookupResourceWithAPIVersion)
- azureApiVersion output is updated to new version after Read
- No errors occur during schema-aware diff calculation

Uses mock resource map with both old (2024-01-02-preview) and new
(2024-10-02-preview) API version metadata to simulate the upgrade scenario.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@EronWright EronWright force-pushed the eronwright/fix-api-version-upgrade-refresh branch from 15a455c to 53c9387 Compare November 24, 2025 22:50
@EronWright EronWright enabled auto-merge (squash) November 24, 2025 22:51
@EronWright EronWright disabled auto-merge November 25, 2025 00:02
@EronWright EronWright merged commit e579916 into master Nov 25, 2025
25 of 32 checks passed
@EronWright EronWright deleted the eronwright/fix-api-version-upgrade-refresh branch November 25, 2025 00:07
@pulumi-bot

Copy link
Copy Markdown
Contributor

This PR has been shipped in release v3.11.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Updating Azure V2 preview API for containerservice causing recreation on pulumi up --refresh

4 participants