Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Server-side diff shows incorrect diffs for list related changes #683

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pjiang-dev
Copy link

@pjiang-dev pjiang-dev commented Feb 12, 2025

fixes argoproj/argo-cd#21817

The current Difference function does not remove every field and its children from the set. This causes unwanted the behavior of nested fields not being removed resulting in an incorrect server side diff. Using the RecursiveDifference function instead will do this:
"this removes every field and its children from s that is contained in s2."

Added additional unit test to cover the nested fields scenario.
Before fix:
Screenshot 2025-02-12 at 10 55 32 AM

After fix:
Screenshot 2025-02-12 at 10 55 10 AM

@pjiang-dev pjiang-dev requested a review from a team as a code owner February 12, 2025 02:06
labels:
app: nested-test
spec:
containers:

Check warning

Code scanning / SonarCloud

Service account permissions should be restricted Medium test

Bind this resource's automounted service account to RBAC or disable automounting. See more on SonarQube Cloud
app: nested-test
spec:
containers:
- name: main-container

Check warning

Code scanning / SonarCloud

Memory limits should be enforced Medium test

Specify a memory limit for this container. See more on SonarQube Cloud
labels:
app: nested-test
spec:
containers:

Check warning

Code scanning / SonarCloud

Service account permissions should be restricted Medium test

Bind this resource's automounted service account to RBAC or disable automounting. See more on SonarQube Cloud
app: nested-test
spec:
containers:
- name: main-container

Check warning

Code scanning / SonarCloud

Memory limits should be enforced Medium test

Specify a memory limit for this container. See more on SonarQube Cloud
Copy link

codecov bot commented Feb 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.36%. Comparing base (8849c3f) to head (e8c9005).
Report is 28 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #683      +/-   ##
==========================================
- Coverage   54.26%   53.36%   -0.90%     
==========================================
  Files          64       64              
  Lines        6164     6412     +248     
==========================================
+ Hits         3345     3422      +77     
- Misses       2549     2714     +165     
- Partials      270      276       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andrii-korotkov-verkada
Copy link
Contributor

Interesting, Sonar should have ignored the test files https://github.com/argoproj/gitops-engine/blob/master/sonar-project.properties#L2.

@andrii-korotkov-verkada
Copy link
Contributor

Can you try adding spaces around = in sonar-project.properties, please?

@andrii-korotkov-verkada
Copy link
Contributor

Ah, looks like cloud doesn't support blobs https://docs.sonarsource.com/sonarqube-server/latest/project-administration/analysis-scope/#wildcard-examples. I can't find exact explanation what that means, but would try replacing pkg/diff/testdata/** with pkg/diff/testdata/**/* as well.

@pjiang-dev pjiang-dev force-pushed the pjiang-dev/ssd-fix branch 2 times, most recently from cb8adef to a82a668 Compare February 12, 2025 18:25
@pjiang-dev
Copy link
Author

Ah, looks like cloud doesn't support blobs https://docs.sonarsource.com/sonarqube-server/latest/project-administration/analysis-scope/#wildcard-examples. I can't find exact explanation what that means, but would try replacing pkg/diff/testdata/** with pkg/diff/testdata/**/* as well.

Strange, tried a couple different wildcards and they don't exclude the testdata files

@andrii-korotkov-verkada
Copy link
Contributor

Did you try adding spaces around = ? Tho I swear it worked for me before just like it was in master.

@pjiang-dev
Copy link
Author

Did you try adding spaces around = ? Tho I swear it worked for me before just like it was in master.

Yes, just added spaces around '=' and still fails Sonar

@andrii-korotkov-verkada
Copy link
Contributor

Hm, now I can only think of two things:

  • Sonar config usage is broken.
  • Files need to be excluded individually.

@zachaller
Copy link

I would imagine part of this issue is with Lists only due to not having merge keys, does RecursiveDifference only do that whole replacement on lists or does it do it for every object type? If it does it for every object type I am not sure that is the correct action but I will differ to @leoluz

@pjiang-dev
Copy link
Author

I would imagine part of this issue is with Lists only due to not having merge keys, does RecursiveDifference only do that whole replacement on lists or does it do it for every object type? If it does it for every object type I am not sure that is the correct action but I will differ to @leoluz

This RecursiveDIfference only works on sets. This is coming from the kubernetes structured-diff library where the original DIfference comes from.

@pjiang-dev
Copy link
Author

There is a bug using this approach when manifest files have a field added by mutation webhooks:
Screenshot 2025-02-12 at 1 58 22 PM

Basically RecursiveDIfference will remove fields even if it is not an exact match. It works by removing subtrees only when there's an exact match at a non-leaf level.

Example:
Added by mutation webhooks set:

.metadata.creationTimestamp
.metadata.labels.peter-test

Managed Fields Set:
.metadata.labels.app.kubernetes.io/instance

In this case .metadata.labels.peter-test will be removed because it matches the field .metadata.labels inside the managed fields set, which is incorrect.

One option is to create a custom Difference function to remove fields at a more granular level.

@andrii-korotkov-verkada
Copy link
Contributor

https://github.com/kubernetes-sigs/structured-merge-diff/blob/master/fieldpath/set.go#L110
Maybe we can ask the community to implement the right variation of the function.

@andrii-korotkov-verkada
Copy link
Contributor

@leoluz, do you think we should implement a custom function or ask structured merge diff devs to do it? I feel like the 2nd is better, though would take some time.

@andrii-korotkov-verkada
Copy link
Contributor

Shall we only remove mfs for leaf nodes or something like that?

pkg/diff/diff.go Outdated
@@ -236,15 +236,15 @@ func removeWebhookMutation(predictedLive, live *unstructured.Unstructured, gvkPa
}
if comparison.Added != nil && !comparison.Added.Empty() {
// exclude the added fields owned by this manager from the comparison
comparison.Added = comparison.Added.Difference(mfs)
comparison.Added = comparison.Added.RecursiveDifference(mfs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, I wonder if we can do something like

predictedLive.ExtractItems(plManagedFields, typed.WithAppendKeyFields())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we just remove every prefix of every managed field path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this as well, and it results to messed up outputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we try to remove managed fields from both live and predicted live before doing the comparison?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nvm, I think we should use RecursiveDifference, but only with leaf mfs. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought - also remove managed fields from the current live.

Copy link
Author

@pjiang-dev pjiang-dev Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't use RecursiveDifference because it could incorrectly remove fields added by mutation webhooks.
However, yes, I think what we could do is keep Difference, but do a second pass to remove any leaf nodes that are still in managed fields that weren't caught by Difference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try out this and also removing manged fields from current live and see if tit works

@andrii-korotkov-verkada
Copy link
Contributor

In this case .metadata.labels.peter-test will be removed because it matches the field .metadata.labels inside the managed fields set, which is incorrect.

Why would .metadata.labels be a part of managed fields set if not all of it is managed?

@andrii-korotkov-verkada
Copy link
Contributor

One more thought - do a recursive difference, but only if the field is managed in both live and predicted live.

@andrii-korotkov-verkada
Copy link
Contributor

andrii-korotkov-verkada commented Feb 13, 2025

// RecursiveDifference returns a Set containing elements which:
// * appear in s
// * do not appear in s2
//
// Compared to a regular difference,
// this removes every field **and its children** from s that is contained in s2.
//
// For example, with s containing `a.b.c` and s2 containing `a.b`,
// a RecursiveDifference will result in `a`, as the entire node `a.b` gets removed.

In the example

Added by mutation webhooks set:

.metadata.creationTimestamp
.metadata.labels.peter-test

Managed Fields Set:
.metadata.labels.app.kubernetes.io/instance

In this case .metadata.labels.peter-test will be removed because it matches the field .metadata.labels inside the managed fields set, which is incorrect.

I don't think .metadata.labels.peter-test would be removed, unless .metadata.labels is also in the managed field set (perhaps because predicted live only has .metadata.labels.app.kubernetes.io/instance).

@andrii-korotkov-verkada
Copy link
Contributor

So IIUC in the example above predicted live should contain:

.metadata.creationTimestamp
.metadata.labels.peter-test
.metadata.labels.app.kubernetes.io/instance

But I don't see why .metadata.labels would be in the managed fields set.

@andrii-korotkov-verkada
Copy link
Contributor

Do I understand correctly, that the problem is with partially managed fields (with children) being considered as fully managed?

@andrii-korotkov-verkada
Copy link
Contributor

andrii-korotkov-verkada commented Feb 13, 2025

Can you also try just this modification, still with using Difference?

Now:

	if comparison.Added != nil && !comparison.Added.Empty() {
		typedPredictedLive = typedPredictedLive.RemoveItems(comparison.Added)
	}

Desired?

	if comparison.Added != nil && !comparison.Added.Empty() {
                // Might extract some key fields still used
                liveAddValues := typedLive.ExtractItems(comparison.Added, typed.WithAppendKeyFields()) 
		typedPredictedLive = typedPredictedLive.RemoveItems(comparison.Added)
                typedPredictedLive, _ = typedPredictedLive.Merge(liveAddValues)
	}

@pjiang-dev
Copy link
Author

pjiang-dev commented Feb 13, 2025

Here is what's happening exactly:

Before Difference() on Line 239
mfs:

.metadata.labels.app.kubernetes.io/instance
.spec.selector
.spec.template.metadata.labels.app.kubernetes.io/name
.spec.template.spec.containers[name="idle"]
.spec.template.spec.containers[name="idle"].image
.spec.template.spec.containers[name="idle"].name
.spec.template.spec.containers[name="idle"].env[name="FOO"]
.spec.template.spec.containers[name="idle"].env[name="FOO"].name
.spec.template.spec.containers[name="idle"].env[name="FOO"].value
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"]
.spec.template.spec.containers[name="idle"].ports[containerPort=9876,protocol="TCP"]
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"].containerPort
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"].name
.spec.template.spec.containers[name="idle"].ports[containerPort=9876,protocol="TCP"].containerPort
.spec.template.spec.containers[name="idle"].ports[containerPort=9876,protocol="TCP"].name

Comparison:

- Modified Fields:
.metadata.generation
.metadata.managedFields
- Added Fields:
.metadata.creationTimestamp
.metadata.labels.peter-test
.spec.template.spec.containers[name="idle"].env
.spec.template.spec.containers[name="idle"].env[name="FOO"]
.spec.template.spec.containers[name="idle"].env[name="FOO"].name
.spec.template.spec.containers[name="idle"].env[name="FOO"].value
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"]
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"].containerPort
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"].name
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"].protocol

After Difference() on line 239
Comparison:

- Modified Fields:
.metadata.generation
.metadata.managedFields
- Added Fields:
.metadata.creationTimestamp
.metadata.labels.peter-test
.spec.template.spec.containers[name="idle"].env
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"].protocol

The bug is that these two fields remain in comparison.Added even after the Difference function because they are not managed fields but their children are. If they are in comparison.Added they will be extracted from predictedLive and cause the diff to not show these fields.

.spec.template.spec.containers[name="idle"].env
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"].protocol

The cause is the Difference function will does a shallow difference and does not recursively check each child.

@pjiang-dev
Copy link
Author

Can you also try just this modification, still with using Difference?

Now:

	if comparison.Added != nil && !comparison.Added.Empty() {
		typedPredictedLive = typedPredictedLive.RemoveItems(comparison.Added)
	}

Desired?

	if comparison.Added != nil && !comparison.Added.Empty() {
                // Might extract some key fields still used
                liveAddValues := typedLive.ExtractItems(comparison.Added, typed.WithAppendKeyFields()) 
		typedPredictedLive = typedPredictedLive.RemoveItems(comparison.Added)
                typedPredictedLive, _ = typedPredictedLive.Merge(liveAddValues)
	}

This one also did not make a difference. I think because we are still extracting those fields.

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
C Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@andrii-korotkov-verkada
Copy link
Contributor

andrii-korotkov-verkada commented Feb 13, 2025

Great analysis!

.spec.template.spec.containers[name="idle"].env

Looks like fields that are parents of other fields are not managed themselves.
We run into a tricky situation when they were added, but only some of their children are managed.
In such case, after removing managed fields, these kind of fields would still remain, so we shouldn't touch them.
In other words, we should remove a non-managed field if and only if all its children are managed.

.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"].protocol

This is due to a key field not specified explicitly, but it can be derived from the path.
It's actually an easier case, since its parent is managed.
We should remove a non-managed field if its parent subpath is managed (e.g. .spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"]).

@andrii-korotkov-verkada
Copy link
Contributor

The solution is:
We compute the leaves of added fields before the mfs subtraction loop. After the loop we go through the remaining added fields and add some fields to a new excluded field set:

  • If it's a leaf, we add all of its subpaths to this field set.
  • If any of its subpaths is managed, we add the path itself to this set.
    Then we subtract this field set from the added fields.

@pjiang-dev
Copy link
Author

@andrii-korotkov-verkada I think this is a good solution. Basically it covers the two edge cases
Case 1: Parent fields with managed children
.spec.template.spec.containers[name="idle"].env
(children like .env[name="FOO"] are managed)

Case 2: Derived fields with managed parents
.spec.template.spec.containers[name="idle"].ports[containerPort=3000,protocol="TCP"].protocol
(parent path is managed)

I can start implementing this and @leoluz can confirm if it makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Server-side diff shows incorrect diffs for list related changes
3 participants