-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for labels #235
Conversation
591cddc
to
17dc0fb
Compare
exe/kubernetes-deploy
Outdated
@@ -23,6 +24,9 @@ ARGV.options do |opts| | |||
opts.on("--no-prune", "Disable deletion of resources that do not appear in the template dir") { prune = false } | |||
opts.on("--template-dir=DIR", "Set the template dir (default: config/deploy/$ENVIRONMENT)") { |v| template_dir = v } | |||
opts.on("--verbose-log-prefix", "Add [context][namespace] to the log prefix") { verbose_log_prefix = true } | |||
opts.on("--selector=key1=value1,key2=value2", "Use selector to filter which resources to prune") do |s| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the --selector
argument is pretty much used for different purposes in a Kubernetes context. Would not be better if the helper message was something more generic. That could be in a way that makes clear to the user that it is not used only by prune
operation. In the future that can be extended and used by others actions/processes.
For example: "Use selector to filter and restrict the resources affected by the desired operation".
Reference:
https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation makes it sound like this flag not only restricts which resources are pruned, but also which ones are deployed:
# Apply the configuration in manifest.yaml that matches label app=nginx and delete all the other resources that are
not in the file and match label app=nginx.
kubectl apply --prune -f manifest.yaml -l app=nginx
If so, it's more like "Deploy only resources matching the provided selector. If pruning is enabled, only resources matching the provided selector will be pruned."
I'm ok with only supporting =
. I could easily be missing a use case, but filtering on !=
doesn't make much sense to me in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat feature. I think there are a few more things to do to get it working predictably though. If I understand correctly, the --selector
flag affects not only what is pruned, but also what is deployed. If so, I think we have two options for target behaviours:
1 - The kubernetes-deploy --selector
flag affects everything, including secrets, predeployed resources and resources that have a deploy_method
other than apply
(the last two are not covered by your PR). This means that if your template is selected, it will be deployed. If it is not selected, it will be completely ignored. If a live resource is selected and is not in your templates, it will be pruned.
2 - The kuberentes-deploy --prune-selector
flag affects pruning only. Everything in your deploy directory will be deployed, but only live resources that are selected and are not in your templates will be pruned.
Which did you have in mind?
Either way, secrets should behave the same way as any other resource IMO: we should not automatically add labels, but instead add a new ejson field where you can specify them.
exe/kubernetes-deploy
Outdated
@@ -23,6 +24,9 @@ ARGV.options do |opts| | |||
opts.on("--no-prune", "Disable deletion of resources that do not appear in the template dir") { prune = false } | |||
opts.on("--template-dir=DIR", "Set the template dir (default: config/deploy/$ENVIRONMENT)") { |v| template_dir = v } | |||
opts.on("--verbose-log-prefix", "Add [context][namespace] to the log prefix") { verbose_log_prefix = true } | |||
opts.on("--selector=key1=value1,key2=value2", "Use selector to filter which resources to prune") do |s| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation makes it sound like this flag not only restricts which resources are pruned, but also which ones are deployed:
# Apply the configuration in manifest.yaml that matches label app=nginx and delete all the other resources that are
not in the file and match label app=nginx.
kubectl apply --prune -f manifest.yaml -l app=nginx
If so, it's more like "Deploy only resources matching the provided selector. If pruning is enabled, only resources matching the provided selector will be pruned."
I'm ok with only supporting =
. I could easily be missing a use case, but filtering on !=
doesn't make much sense to me in this context.
lib/kubernetes-deploy/utils.rb
Outdated
module Utils | ||
def self.labels_to_string(labels) | ||
return "" if labels.nil? | ||
labels.each_pair.map { |k, v| "#{k}=#{v}" }.join(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think labels.map
(without each_pair
) would do the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh huh til
test/fixtures/branched/web.yml.erb
Outdated
app: branched | ||
branch: <%= branch %> | ||
annotations: | ||
shipit.shopify.io/restart: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we modify the restart command too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually doesn't really makes sense to add to this, because restart just uses the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are actually two ways to use the restart command: give it a list of deployment names, or restart everything in the namespace that has this annotation on it. The latter is actually a lot more common (at Shopify).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh of course. I'll add it
|
||
# Run again without selector to verify pruning works | ||
assert_deploy_success(deploy_fixtures("branched", bindings: { "branch" => "master" })) | ||
deployments = v1beta1_kubeclient.get_deployments(namespace: @namespace, label_selector: "app=branched") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: you could use the branch
selector and more specifically assert that the correct thing was pruned (1 master deployment, 0 staging deployments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll assert that the label of the deployment matches what we expect
assert_deploy_failure(deploy_fixtures("branched", | ||
bindings: { "branch" => "master" }, | ||
selector: { "branch" => "staging" })) | ||
assert_logs_match(/error: no objects passed to create/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you paste the full failure output from this test? (Use PRINT_LOGS=1
to get it) I usually like to use assert_logs_match_all
with in_order: true
to be more strict about where the error appeared (i.e. nicely displayed vs. in an exception that raised all the way out).
test/fixtures/branched/web.yml.erb
Outdated
progressDeadlineSeconds: 20 | ||
selector: | ||
matchLabels: | ||
name: web |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you were to deploy this with selector name=web
, would it get deployed? I'd guess not, but would like to confirm my understanding of how it's doing the filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would not, because that selector applies to the top-level resource, the Deployment in this case which doesn't have that label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next question -- scenario:
- Person A does a deploy that successfully ships this template
- Person B does a deploy with
name=web
selector, which ignores this template for deploy purposes. However, pods are allowed to be pruned, and the pods created by this template in person A's deploy do match the selector. What happens to them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things only get pruned when they're deployed by apply or create right? I think kubectl apply looks at an annotation. Because otherwise it would always prune everything if you don't pass in a selector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks for the presence of a specific annotation that is only added by apply
(not create
). You're right, that annotation isn't conveyed to the dependent resources, so they'll be fine. 🤦♀️
command = ["create", "-f", file_path, "--dry-run", "--output=name"] | ||
command.push("--selector", Utils.labels_to_string(selector)) if selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we run the command with a selector, but the resource in question isn't selected, won't it throw an error? We definitely need test coverage for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the error: no objects passed to create
error will happen, which I have a test for. It's a bit gross though, need to figure out how to communicate that better to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I thought that error message in the test was coming from apply
. So if less than 100% of the templates in your deploy directory are covered by the selector, the deploy fails... That seems reasonable, actually. We could specifically test that nothing gets deployed (even predeployed resources) if anything doesn't match the selector. I agree that we should transform the error though. Maybe even print out the problematic template's metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add something to the resource validation that specifically checks for this case
Thanks for the reply and detailed review! I'll address the review comments individually. About how this should work with filtering: Right now what happens if a template is included in the deployment that doesn't match the selector, the kubectl command fails when doing the validation with that re: non-apply resources, I didn't realize those exist and will have to think about how to use those... They might not really fit in with the selector stuff, because they're not pruned anyways (if I understand correctly). I added an explanation of my use-case to the PR description, for future reference. |
💯 that makes sense to me.
I might have been wrong about predeployed resources not being covered by your PR. We'll need to add tests to be sure. Since all resources are validated up-front, the fact that validation uses selectors should stop the predeploy too. For non-apply resources, you're right that they aren't pruned... IMO it's actually a bug that we hadn't noticed because none of them are currently in the prune whitelist anyway (arguably PodDisruptionBudget should be though). That bug can be out of scope of this PR. cc @stefanmb did you run into the |
3c3e198
to
c23599c
Compare
@KnVerey I've addressed your comments.
I don't think adding an argument to restart makes sense, as that just uses the name to find the resource. Here's a run with
|
exe/kubernetes-deploy
Outdated
@@ -23,6 +24,10 @@ ARGV.options do |opts| | |||
opts.on("--no-prune", "Disable deletion of resources that do not appear in the template dir") { prune = false } | |||
opts.on("--template-dir=DIR", "Set the template dir (default: config/deploy/$ENVIRONMENT)") { |v| template_dir = v } | |||
opts.on("--verbose-log-prefix", "Add [context][namespace] to the log prefix") { verbose_log_prefix = true } | |||
opts.on("--selector=SELECTOR", "Use selector to filter which resources to prune and apply the templates to. " \ | |||
"All the templates need to match the selector or the command will fail (format: k1=v1,k2=v2 or JSON)") do |s| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Filter which resources ... to apply the templates to" doesn't quite make sense. Suggestion: "Prevent resources that do not match the given selector from being deployed or pruned. All the templates in your template dir need to match...", or maybe "Ensure that all resources in your template dir match the given selector, and restrict pruning to live resources it selects. (format:...)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the second one 👍
@@ -137,13 +139,16 @@ def generate_secret_yaml(secret_name, secret_type, data) | |||
encoded[secret_key] = Base64.strict_encode64(value) | |||
end | |||
|
|||
labels = { "name" => secret_name } | |||
labels.merge!(@selector) if @selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your earlier comments that this could lead to unexpected overwrites, and IMO secrets should behave the same way as any other resource. This means making it possible to specify labels in the ejson, and validating all of the secrets against @selector
(in addition to the current validations) ahead of creation, ideally at the same time as the other resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that makes sense. We'll have to support ERB for EJSON then too, because you'd want to be able to prefix secret names dynamically to ensure they're not overwritten by other deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did some testing and I'm afraid this isn't really going to work. If we have it look something like this:
// secrets.ejson.erb
{
"_public_key": "65f79806388144edf800bf9fa683c98d3bc9484768448a275a35d398729c892a",
"kubernetes_secrets": {
"<%= branch %>-catphotoscom": {
"_type": "kubernetes.io/tls",
"_labels": {
"branch": "<%= branch %>"
},
"data": {
"tls.crt": "EJ[1:N0RQ6a1BcGJ7w/1ABQUC3C2DQMP+qWuc10LoNUQOiUg=:Z3AW84E4MTa3ntO9jHqVAFy3cpDVF3tC:BpMYh76OGkBIi0BKlFG1gZnkSGGCmwLUX5Fk+BkUCn0mN/X5Nrvs]",
"tls.key": "EJ[1:N0RQ6a1BcGJ7w/1ABQUC3C2DQMP+qWuc10LoNUQOiUg=:W9F16TEKrtF1rdXaBaRv6bX7uIOB0SJi:O6kgaj5mly05JXjFt10eo6sj6ZBNzvNxQrPBBQ1i0g==]"}
}
}
}
Then the problem is that ejson will encrypt "branch": "<%= branch %>"
, even though it's under an underscore key. Probably a bug worth fixing in ejson.
What do you think is the best way forward with this then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's because underscore keys don't propagate downwards; you'd need to use _branch
rather than _labels
. Thinking about this further, I'm not 100% sure I like the idea of being able to use ERB into a secrets file. But we do need to have some way of making the behaviour predictable, including if two selector-based deploys were to happen at the same time to the same namespace. Perhaps an MVP option would be to automatically disable pruning if a selector is given, and print a warning, e.g. "Secret pruning is disabled when deploying with --selector
because labels are not yet supported by the secret management feature." Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore below, it complicates matters needlessly
As a work-around could we include the selector json
as a string literal to be parsed? For example, instead of:
{
"_public_key": "65f79806388144edf800bf9fa683c98d3bc9484768448a275a35d398729c892a",
"kubernetes_secrets": {
"foo": {
"type": "kubernetes.io",
"_labels": {
"branch": "foo"
}
}
}
}
We can use:
{
"_public_key": "65f79806388144edf800bf9fa683c98d3bc9484768448a275a35d398729c892a",
"kubernetes_secrets": {
"foo": {
"type": "EJ[1:OrPRTUi2rTBE6Lfci5FHIoJSYK36dBwC7JYU7g8QzWw=:00UAkBOnyrR57I5Zc2oT0f8QStFHSCHb:6hQz9FzuUV0z18gjxv2028Csw8kQOFDDZM6whuI=]",
"_labels": "{\"branch\": \"foo\"}"
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... you'd need to use _branch rather than _labels
Just tried this out and it works, seems to be the better idea.
I'm not opposed to ERB secrets, I don't see why template variables should be allowed in other resources, but not secrets.
end | ||
|
||
def labels | ||
@definition.dig("metadata", "labels") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just used test/fixtures/ejson-cloud/web.yaml
to test out how apply --selector
interacts with defaulting. If you do not specify any labels at all on the deployment, they will be inherited from the pod template. It turns out that apply
with --selector
does not handle that nicely. As a result, the following can happen:
- Try to apply that template with
--selector=app=ejson-cloud
. It will fail. - Apply it without a selector. It will succeed and the live copy with have the
app=ejson-cloud
selector. - Apply something else with the
app=ejson-cloud
selector at the top level, using--prune --selector=app=ejson-cloud
. It will prune the original deployment.
In other words, you can end up pruning a resource you were unable to create. Not much we can do about that, but seems like a gotcha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In API version apps/v1, .spec.selector and .metadata.labels do not default to .spec.template.metadata.labels if not set. So they must be set explicitly. Also note that .spec.selector is immutable after creation of the Deployment in apps/v1.
This is from https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#selector, so we should probably not do anything special for this
test/fixtures/branched/web.yml.erb
Outdated
app: branched | ||
branch: <%= branch %> | ||
annotations: | ||
shipit.shopify.io/restart: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are actually two ways to use the restart command: give it a list of deployment names, or restart everything in the namespace that has this annotation on it. The latter is actually a lot more common (at Shopify).
@@ -99,6 +99,54 @@ def test_pruning_disabled | |||
hello_cloud.assert_all_redis_resources_up | |||
end | |||
|
|||
def test_selector | |||
ejson_cloud = FixtureSetAssertions::EjsonCloud.new(@namespace) | |||
ejson_cloud.create_ejson_keys_secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this here for?
assert_logs_match_all([ | ||
/Template validation failed/, | ||
/Invalid template: Deployment/, | ||
/selector branch=staging does not match labels app=branched,branch=master/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in the output that you pasted that this appears under "> Error from kubectl:", which is a bit confusing. Now that there are other sources of errors, let's make that more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another validator in kubernetes_resource.rb already, that has the same awkward result right now. I can think of a fix though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Should be a matter of simply changing the string here though: https://github.com/Shopify/kubernetes-deploy/blob/master/lib/kubernetes-deploy/deploy_task.rb#L263
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just change it to "> Error:", as it's already clear from the context that it's to do with template validation
b1a7c63
to
5e04043
Compare
xhyve is deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach LGTM with some comments re secrets.
ARGV.options do |opts| | ||
opts.on("--deployments=LIST") { |v| raw_deployments = v.split(",") } | ||
opts.on("--deployments=LIST") { |v| deployments = v.split(",") } | ||
opts.on("--selector=SELECTOR", "Restarts deployments matching selector (format: k1=v1,k2=v2 or JSON)") do |s| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should standardize spacing/punctuation/formatting of the message with `exe/kubernetes-deploy:
... matching selector (format: k1=v1,k2=v2 or JSON)")
... it selects. (format: k1=v1,k2=v2 or JSON)")
@@ -137,13 +139,16 @@ def generate_secret_yaml(secret_name, secret_type, data) | |||
encoded[secret_key] = Base64.strict_encode64(value) | |||
end | |||
|
|||
labels = { "name" => secret_name } | |||
labels.merge!(@selector) if @selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore below, it complicates matters needlessly
As a work-around could we include the selector json
as a string literal to be parsed? For example, instead of:
{
"_public_key": "65f79806388144edf800bf9fa683c98d3bc9484768448a275a35d398729c892a",
"kubernetes_secrets": {
"foo": {
"type": "kubernetes.io",
"_labels": {
"branch": "foo"
}
}
}
}
We can use:
{
"_public_key": "65f79806388144edf800bf9fa683c98d3bc9484768448a275a35d398729c892a",
"kubernetes_secrets": {
"foo": {
"type": "EJ[1:OrPRTUi2rTBE6Lfci5FHIoJSYK36dBwC7JYU7g8QzWw=:00UAkBOnyrR57I5Zc2oT0f8QStFHSCHb:6hQz9FzuUV0z18gjxv2028Csw8kQOFDDZM6whuI=]",
"_labels": "{\"branch\": \"foo\"}"
}
}
}
@bouk Are you still working on this? |
We didn't end up needing this so I'm going to close this. |
This PR adds support for the
--selector
flag, which makes it possible to restrict what resources get pruned on a deployment. It takes in a label selector which is then passed ontokubectl apply --prune
to restrict which things to delete.Some unanswered questions:
key!=value
.Use-case
I want to deploy multiple branches of an application within the same namespace, sharing some resources. The project layout would be something like this:
Where I'd run
ENVIRONMENT=staging kubernetes-deploy --selector=shared-resource=true ...
to deploy the shared resources, and thenENVIRONMENT=staging-branch kubernetes-deploy --selector=branch=name --bindings=branch=name ...
to deploy a single branch. All the resources in the staging folder would have ashared-resource: 'true'
label defined, and the things in the staging-branch folder would havebranch: <%= branch %>
in their labels.Right now this isn't possible because the pruning would delete things from the other deployments. After this PR it should be possible.