-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account for mutating webhooks with side effects when batch dry-run server-side apply #798
Account for mutating webhooks with side effects when batch dry-run server-side apply #798
Conversation
lib/krane/deploy_task.rb
Outdated
groups.each do |group| | ||
versions.each do |version| | ||
kinds.each do |kind| | ||
individuals += resources.select do |r| | ||
(r.group == group || group == '*' || match_policy == "Equivalent") && | ||
(r.version == version || version == '*' || match_policy == "Equivalent") && | ||
(r.type.downcase == kind.downcase) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a straightforward implementation of the rules defined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea how to ask the api server what GVKs it knows to convert between? Not saying we use them, but this adds to our list of things that break with non-unique kinds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea how to ask the api server what GVKs it knows to convert between?
From my understanding, the Equivalent
condition simply disregards gv (I'm tempted to open an upstream issue since this can have undesirable side-effects as we've found). I don't think this is something for Krane to deal with.
From the official reference:
Equivalent: match a request if modifies a resource listed in rules, even via another API group or version. For example, if deployments can be modified via apps/v1, apps/v1beta1, and extensions/v1beta1, and "rules" only included apiGroups:["apps"], apiVersions:["v1"], resources: ["deployments"], a request to apps/v1beta1 or extensions/v1beta1 would be converted to apps/v1 and sent to the webhook.
deploy_fixtures('hello-cloud', subset: %w(web.yml.erb secret.yml), render_erb: true) do |fixtures| | ||
container = fixtures['web.yml.erb']['Deployment'][0]['spec']['template']['spec']['containers'][0] | ||
container['volumes'] = [ | ||
name: 'secret', | ||
secret: { | ||
secretName: fixtures['secret.yml']["Secret"][0]['metadata']['name'], | ||
}, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not fail the dry-run, as I would have thought, even though the Secret resource is not part of the dry-run apply (and is not present on the cluster beforehand). I will try a few more failure scenarios to see if I can actually induce this type of transitive failure
end | ||
batchable_resources, individuals = partition_dry_run_resources(resources) | ||
batch_dry_run_success = kubectl.server_dry_run_enabled? && validate_dry_run(batchable_resources) | ||
individuals += batchable_resources unless batch_dry_run_success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we still fall back to per-resource dry-running if the batch fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small thoughts but looking like its on the right direction.
lib/krane/deploy_task.rb
Outdated
individuals = [] | ||
mutating_webhooks = cluster_resource_discoverer.fetch_mutating_webhook_configurations | ||
mutating_webhooks.each do |spec| | ||
spec.dig('webhooks').each do |webhook| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think we want a MutatingWebhookConfigurations
class instead of putting all of the logic in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a few attempts at this. Modelling it as a KubernetesResource
didn't seem right, but neither did modelling it in the top-level namespace. If I had to make a choice, I'm tempted to do the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its time for an Internal
namespace for objects that are K8s objects but aren't designed for having deploy logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm loath to make the separation... ultimately this is still just a kubernetes object. For now, I've made it a subclass of KubernetesResource
and added the appropriate overrides should one of these ever be deployed.
lib/krane/deploy_task.rb
Outdated
groups.each do |group| | ||
versions.each do |version| | ||
kinds.each do |kind| | ||
individuals += resources.select do |r| | ||
(r.group == group || group == '*' || match_policy == "Equivalent") && | ||
(r.version == version || version == '*' || match_policy == "Equivalent") && | ||
(r.type.downcase == kind.downcase) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea how to ask the api server what GVKs it knows to convert between? Not saying we use them, but this adds to our list of things that break with non-unique kinds.
lib/krane/deploy_task.rb
Outdated
(r.version == version || version == '*' || match_policy == "Equivalent") && | ||
(r.type.downcase == kind.downcase) | ||
end | ||
resources -= individuals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can never remember the pass by semantics of ruby but, Is resources a copy or does this impact validate_resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pass by reference, however the -=
operator returns a new object, not a mutation of the old one. If we used, e.g. <<
, then we would mutate the incoming object. As a matter of caution, I'll dup
the resources
array to avoid any issues in the future.
Co-authored-by: Danny Turner <[email protected]>
…://github.com/Shopify/krane into server_dry_run_mutating_webhooks_side_effects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions about the tests, code looks good
lib/krane/kubernetes_resource/mutating_webhook_configuration.rb
Outdated
Show resolved
Hide resolved
|
||
def test_resources_with_side_effect_inducing_webhooks_are_not_batched_server_side_dry_run | ||
resp = mutating_webhook_fixture(File.join(fixture_path("mutating_webhook_configurations"), "ingress_hook.json")) | ||
Krane::ClusterResourceDiscovery.any_instance.expects(:fetch_mutating_webhook_configurations).returns(resp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not actually deploy this into the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize I could just set failurePolicy: ignore
with a small timeout to circumvent the need for an actual webhook server. I've changed the tests to actually deploy a real MutatingWebhookConfiguration
and use that to test things out.
Worth noting that the partitioning logic is only really necessary for v1beta1
resources, since v1
only allows (and requires) either None
or NoneOnDryRun
to be specified for the sideEffects
field.
], in_order: true) | ||
end | ||
|
||
def test_resources_with_side_effect_inducing_webhooks_are_not_batched_server_side_dry_run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing how this test verifies are_not_batched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fleshed out this test a bit more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delayed review
This PR adds some smarts to the batch dry-runner. In particular, it cross-references the supplied list of resources with those that have mutating admission webhooks which are not dry-run safe. Resources that we know will fail batch dry-running are thus pre-emptively added to the legacy dry-run flow.