Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for mutating webhooks with side effects when batch dry-run server-side apply #798

Merged

Conversation

timothysmith0609
Copy link
Contributor

@timothysmith0609 timothysmith0609 commented Feb 2, 2021

This PR adds some smarts to the batch dry-runner. In particular, it cross-references the supplied list of resources with those that have mutating admission webhooks which are not dry-run safe. Resources that we know will fail batch dry-running are thus pre-emptively added to the legacy dry-run flow.

Comment on lines 291 to 298
groups.each do |group|
versions.each do |version|
kinds.each do |kind|
individuals += resources.select do |r|
(r.group == group || group == '*' || match_policy == "Equivalent") &&
(r.version == version || version == '*' || match_policy == "Equivalent") &&
(r.type.downcase == kind.downcase)
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a straightforward implementation of the rules defined here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how to ask the api server what GVKs it knows to convert between? Not saying we use them, but this adds to our list of things that break with non-unique kinds.

Copy link
Contributor Author

@timothysmith0609 timothysmith0609 Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how to ask the api server what GVKs it knows to convert between?

From my understanding, the Equivalent condition simply disregards gv (I'm tempted to open an upstream issue since this can have undesirable side-effects as we've found). I don't think this is something for Krane to deal with.

From the official reference:

Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would be converted to apps/v1 and sent to the webhook.

Comment on lines 546 to 553
deploy_fixtures('hello-cloud', subset: %w(web.yml.erb secret.yml), render_erb: true) do |fixtures|
container = fixtures['web.yml.erb']['Deployment'][0]['spec']['template']['spec']['containers'][0]
container['volumes'] = [
name: 'secret',
secret: {
secretName: fixtures['secret.yml']["Secret"][0]['metadata']['name'],
},
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not fail the dry-run, as I would have thought, even though the Secret resource is not part of the dry-run apply (and is not present on the cluster beforehand). I will try a few more failure scenarios to see if I can actually induce this type of transitive failure

end
batchable_resources, individuals = partition_dry_run_resources(resources)
batch_dry_run_success = kubectl.server_dry_run_enabled? && validate_dry_run(batchable_resources)
individuals += batchable_resources unless batch_dry_run_success
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we still fall back to per-resource dry-running if the batch fails

@timothysmith0609 timothysmith0609 marked this pull request as ready for review February 2, 2021 21:18
@timothysmith0609 timothysmith0609 requested a review from a team as a code owner February 2, 2021 21:18
Copy link
Contributor

@dturn dturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small thoughts but looking like its on the right direction.

lib/krane/kubernetes_resource.rb Outdated Show resolved Hide resolved
individuals = []
mutating_webhooks = cluster_resource_discoverer.fetch_mutating_webhook_configurations
mutating_webhooks.each do |spec|
spec.dig('webhooks').each do |webhook|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think we want a MutatingWebhookConfigurations class instead of putting all of the logic in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few attempts at this. Modelling it as a KubernetesResource didn't seem right, but neither did modelling it in the top-level namespace. If I had to make a choice, I'm tempted to do the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its time for an Internal namespace for objects that are K8s objects but aren't designed for having deploy logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm loath to make the separation... ultimately this is still just a kubernetes object. For now, I've made it a subclass of KubernetesResource and added the appropriate overrides should one of these ever be deployed.

Comment on lines 291 to 298
groups.each do |group|
versions.each do |version|
kinds.each do |kind|
individuals += resources.select do |r|
(r.group == group || group == '*' || match_policy == "Equivalent") &&
(r.version == version || version == '*' || match_policy == "Equivalent") &&
(r.type.downcase == kind.downcase)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how to ask the api server what GVKs it knows to convert between? Not saying we use them, but this adds to our list of things that break with non-unique kinds.

(r.version == version || version == '*' || match_policy == "Equivalent") &&
(r.type.downcase == kind.downcase)
end
resources -= individuals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can never remember the pass by semantics of ruby but, Is resources a copy or does this impact validate_resources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pass by reference, however the -= operator returns a new object, not a mutation of the old one. If we used, e.g. <<, then we would mutate the incoming object. As a matter of caution, I'll dup the resources array to avoid any issues in the future.

Copy link
Contributor

@dturn dturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions about the tests, code looks good

test/integration-serial/serial_deploy_test.rb Outdated Show resolved Hide resolved

def test_resources_with_side_effect_inducing_webhooks_are_not_batched_server_side_dry_run
resp = mutating_webhook_fixture(File.join(fixture_path("mutating_webhook_configurations"), "ingress_hook.json"))
Krane::ClusterResourceDiscovery.any_instance.expects(:fetch_mutating_webhook_configurations).returns(resp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not actually deploy this into the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize I could just set failurePolicy: ignore with a small timeout to circumvent the need for an actual webhook server. I've changed the tests to actually deploy a real MutatingWebhookConfiguration and use that to test things out.

Worth noting that the partitioning logic is only really necessary for v1beta1 resources, since v1 only allows (and requires) either None or NoneOnDryRun to be specified for the sideEffects field.

], in_order: true)
end

def test_resources_with_side_effect_inducing_webhooks_are_not_batched_server_side_dry_run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing how this test verifies are_not_batched

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fleshed out this test a bit more

test/integration-serial/serial_deploy_test.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@dturn dturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@karanthukral karanthukral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delayed review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants