Skip to content

[v17] Fix RemoteCluster Update revision mismatch#56974

Merged
smallinsky merged 1 commit intobranch/v17from
smallinsky/fix_remote_cluster_update_cache
Jul 23, 2025
Merged

[v17] Fix RemoteCluster Update revision mismatch#56974
smallinsky merged 1 commit intobranch/v17from
smallinsky/fix_remote_cluster_update_cache

Conversation

@smallinsky
Copy link
Copy Markdown
Contributor

@smallinsky smallinsky commented Jul 21, 2025

What

This PR addresses a caching issue in v17 related to the RemoteCluster object, where the cached revision becomes inconsistent due to an unintended revision overwrite during the caching process.

Flow:
1. A RemoteCluster object is created in the backend -> revision A.
2. The cache executor receives the object with revision A.
3. It calls CreateRemoteCluster.
4. This invokes AtomicWrite, which creates a new local revision.
5. The object in cache ends up with a different (random) revision than the original.

In the result item := Get(name) - > Update(item) call will fail.

The fix switches from the legacy collection to the new collection backend, where the storage layer that overrides the revision during caching is bypassed - Objects are stored directly in a btree.

v18 >= not affected.

changelog: Resolved an issue where RemoteCluster objects stored in the cache had incorrect revisions, causing Update calls to fail.

@smallinsky smallinsky force-pushed the smallinsky/fix_remote_cluster_update_cache branch from b384761 to c97ec20 Compare July 21, 2025 09:49
@smallinsky smallinsky marked this pull request as ready for review July 21, 2025 09:51
@github-actions github-actions bot requested a review from capnspacehook July 21, 2025 09:52
@smallinsky smallinsky force-pushed the smallinsky/fix_remote_cluster_update_cache branch from a7096b2 to 49b7888 Compare July 21, 2025 09:53
@smallinsky smallinsky changed the title Fix RemoteCluster Update revision mismatch [v17] Fix RemoteCluster Update revision mismatch Jul 21, 2025
@smallinsky smallinsky changed the title [v17] Fix RemoteCluster Update revision mismatch Fix RemoteCluster Update revision mismatch Jul 21, 2025
@smallinsky smallinsky force-pushed the smallinsky/fix_remote_cluster_update_cache branch 3 times, most recently from 70bcdc7 to 03a501b Compare July 21, 2025 10:24
Comment thread lib/cache/collections.go Outdated
Comment thread lib/cache/remote_cluster.go Outdated
Comment thread lib/cache/remote_cluster.go Outdated
Comment thread lib/cache/remote_cluster.go Outdated
Comment thread lib/cache/remote_cluster_test.go Outdated
},
})

t.Run("test cluster get/update", func(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new test case? Should it be forward ported to v18+ to prevent the same bug from being reintroduced?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should have a generic test coverage for that. not only for the Remote Cluster resource. Fortunately the alignment with add pagination test for cache rescue will validate this case on each resources.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the pagination tests changes you recently made would cover this case can we backport them to v17 too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but backport the pagination UTs the issue needs to be resolved on v17 first.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the pagination test changes depend on this PR?

Copy link
Copy Markdown
Contributor Author

@smallinsky smallinsky Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR address the issue found by pagination tests. So without this PR with the fix the pagination tests can be merged.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a TODO to remove this when the pagination tests land then?

Copy link
Copy Markdown
Contributor Author

@smallinsky smallinsky Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

  // TODO(smallinsky): Remove this once pagination tests covering this case for each resource type
  // have been merged into v17.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment thread lib/cache/remote_cluster_test.go Outdated
@smallinsky smallinsky force-pushed the smallinsky/fix_remote_cluster_update_cache branch 3 times, most recently from 4831066 to 53a8b6a Compare July 21, 2025 13:19
@zmb3 zmb3 changed the title Fix RemoteCluster Update revision mismatch [v17] Fix RemoteCluster Update revision mismatch Jul 21, 2025
@smallinsky smallinsky requested a review from rosstimothy July 21, 2025 20:09
@smallinsky smallinsky force-pushed the smallinsky/fix_remote_cluster_update_cache branch from 53a8b6a to 70336d7 Compare July 23, 2025 09:25
@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from capnspacehook July 23, 2025 11:10
@smallinsky smallinsky added this pull request to the merge queue Jul 23, 2025
Merged via the queue into branch/v17 with commit 559e410 Jul 23, 2025
38 checks passed
@smallinsky smallinsky deleted the smallinsky/fix_remote_cluster_update_cache branch July 23, 2025 12:22
@doggydogworld doggydogworld mentioned this pull request Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants