[v17] Fix RemoteCluster Update revision mismatch#56974
Conversation
b384761 to
c97ec20
Compare
a7096b2 to
49b7888
Compare
70bcdc7 to
03a501b
Compare
| }, | ||
| }) | ||
|
|
||
| t.Run("test cluster get/update", func(t *testing.T) { |
There was a problem hiding this comment.
Is this a new test case? Should it be forward ported to v18+ to prevent the same bug from being reintroduced?
There was a problem hiding this comment.
I think that we should have a generic test coverage for that. not only for the Remote Cluster resource. Fortunately the alignment with add pagination test for cache rescue will validate this case on each resources.
There was a problem hiding this comment.
If the pagination tests changes you recently made would cover this case can we backport them to v17 too?
There was a problem hiding this comment.
yes but backport the pagination UTs the issue needs to be resolved on v17 first.
There was a problem hiding this comment.
Why does the pagination test changes depend on this PR?
There was a problem hiding this comment.
This PR address the issue found by pagination tests. So without this PR with the fix the pagination tests can be merged.
There was a problem hiding this comment.
Let's add a TODO to remove this when the pagination tests land then?
There was a problem hiding this comment.
Added:
// TODO(smallinsky): Remove this once pagination tests covering this case for each resource type // have been merged into v17.
4831066 to
53a8b6a
Compare
53a8b6a to
70336d7
Compare
What
This PR addresses a caching issue in v17 related to the RemoteCluster object, where the cached revision becomes inconsistent due to an unintended revision overwrite during the caching process.
Flow:
1. A RemoteCluster object is created in the backend -> revision A.
2. The cache executor receives the object with revision A.
3. It calls CreateRemoteCluster.
4. This invokes AtomicWrite, which creates a new local revision.
5. The object in cache ends up with a different (random) revision than the original.
In the result item := Get(name) - > Update(item) call will fail.
The fix switches from the legacy collection to the new collection backend, where the storage layer that overrides the revision during caching is bypassed - Objects are stored directly in a btree.
v18 >= not affected.
changelog: Resolved an issue where RemoteCluster objects stored in the cache had incorrect revisions, causing Update calls to fail.