-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy secrets in DeployTask like other resources #424
Conversation
Test failures on CI are due to small output differences across k8s versions, I'll fix those. |
I had been working on exactly the same change this morning, was nearly done with it too 😅
This is a requirement for my use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor thoughts
I need this and #421 quite urgently. Is there anything I can do to help move either forward? I do not have access to buildkite to see what the failures are :/ |
@benlangfeld some context on timelines: I'm actively working on this, but not exclusively, and as I'm separated from my reviewers by a few timezones this will probably take in the order of ~days to complete - not a few hours, but (hopefully) not ~weeks either. Also, sorry for not communicating more loudly I was doing this before you started hacking yourself. |
63bd3aa
to
9cfa46f
Compare
An annoyance to fix later: I couldn't replicate the policial failures locally, so I guess we have some config drift. |
df9a98b
to
57d3e9a
Compare
This is ready for another pass. |
I can't find a reference to this annotation? I would think keeping that annotation is a good idea and would be useful for our setup. |
Secrets created from ejson are currently annotated with
No, it's used by the current implementation to ensure only 'managed' secrets are pruned, but after this the lifecycle of secrets will be managed like any other resource. So I guess the answer to my own question is "no it's not needed", but want to make sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works great for me in a dummy app. I can even deploy Secrets both via EJSON and normal templates in the same deployment. The ejson-keys
Secret is not pruned, but others are pruned correctly. Perfect from my perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need/want the 'managed secret' annotation?
I think it's helpful, and its current text actually still makes sense, but our references to it as a "management" thing should change to avoid confusion for future contributors.
What additional testing would add confidence this isn't going to break anything in the wild?
I don't think there's any additional unit/integration tests we can do unless we decide to make a transitional version (see my first inline comment). Each org is going to need to audit its namespaces for secrets that have been applied. I don't really see a viable way around that requirement other than disabling secret pruning (which I'd rather not do).
Would anyone prefer I separate out the 'sensitive output' refactor?
No, this is fine.
I flagged secrets for predeployment, on the basis that pods deployed later may need the new versions. Is that sound?
Yep!
Still tbd after this: support for adding labels from ejson, and resolving how to 'template' ejson secrets. This PR is big enough already.
Agreed.
end | ||
|
||
def test_can_deploy_template_dir_with_only_secrets_ejson | ||
ejson_cloud = FixtureSetAssertions::EjsonCloud.new(@namespace) | ||
ejson_cloud.create_ejson_keys_secret | ||
assert_deploy_success(deploy_fixtures("ejson-cloud", subset: ["secrets.ejson"])) | ||
assert_logs_match_all([ | ||
"Deploying kubernetes secrets from secrets.ejson", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an assertion on the line you replaced this with either here or in one of the main ejson provisioning tests?
CHANGELOG.md
Outdated
@@ -1,5 +1,8 @@ | |||
## next | |||
|
|||
*Features* | |||
- Support for deploying Secrets from templates ([#424](https://github.com/Shopify/kubernetes-deploy/pull/424)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need a flashy "Breaking change" entry here because of the pruning implications and the critical nature of secrets. These are the cases I've thought of so far:
- If you previously used this gem to deploy secrets from EJSON and the first time commit you deploy using this version removes one or more of them, they will not be pruned.
- We could actually avoid this one by making a transitional release, but I'm not sure it is justified, since the impact is not deleting something, we're not at 1.0, and we have been maintaining this changelog strictly for folks for a long time.
- If you previously manually
kubectl apply
'd secrets that are not passed to kubernetes-deploy, your first deploy using this version is going to delete them- We could potentially make a transitional version log warnings about these, but I kinda doubt people would notice them. Users (including us at Shopify) are going to have to audit their cluster(s) for applied secrets before rolling out this version regardless.
⚠️
- We could potentially make a transitional version log warnings about these, but I kinda doubt people would notice them. Users (including us at Shopify) are going to have to audit their cluster(s) for applied secrets before rolling out this version regardless.
- If you previously passed secrets manifests to kubernetes-deploy (we would have ¯\_(ツ)_/¯ applied them) and they are no longer in the set you pass to the first deploy using this version, it will delete them
Can you think of any others? One case I think actually doesn't cause trouble is deploying with the new version and then rolling back to deploying with the old (as long as we keep our ejson secret annotation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you think of any others?
I cannot 👍
One case I think actually doesn't cause trouble is deploying with the new version and then rolling back to deploying with the old (as long as we keep our ejson secret annotation).
Which is a solid reason to keep the annotation in itself!
Did a 🎩 (PRINT_LOGS=1) of some of the tests and have a couple additional thoughts based on the output below. I think we should:
|
@DazWorrall I'm addressing some of those code review comments in #425 |
f73800a
to
39e9524
Compare
Thanks so much @benlangfeld for your help with these: * Remove excess logging * Stop referring to EJSON secrets as generically "managed" * Consistent timeout for Secret resources Unifying the constant used for simple resources of this type is left as an exercise for another change, mostly because the name of such a thing may be controversial and I don't want to block merging this on detailed review. * Give secrets the same statsd tags as other resources * Removes duplicate spec Doesn't test anything more than test_create_and_update_secrets_from_ejson * Replaces log assertion * Point out breaking changes in CHANGELOG * Include ejson generated secrets in discovery log
* Rebase on master * Fix changelog after rebase * Remove unnecessary logging * Fix unknown Secret status * Update test name to not include 'update' * Write new unrecognized resource test
d0e5bd5
to
0581074
Compare
0581074
to
67bb1b9
Compare
@KnVerey rebased and addressed your feedback. I had to use some unpleasant stubbing to get the tests done but I'm happy we've covered all the known failure states now. I had to add a couple of serial tests - they proved flaky during CI when ran in parallel, but running in isolation they're fine so there's a race somewhere I don't understand.
Sorry I missed this @benlangfeld, I was iterating between meetings and not paying attention to my inbox :( I appreciate the thought! |
No problem. Just doing whatever I can to get this thing merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more small comments, but LGTM.
refute_logs_match("kind: Deployment") # content of the sensitive template | ||
end | ||
|
||
def test_apply_failure_with_sensitive_resources_hides_raw_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the stubbing that makes the test above concurrency-unfriendly (mocha
isn't threadsafe), but this one looks really normal and should be able to run in parallel. What flakes did you see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These pass for me. aab294c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that commit fails in CI, might I propose that this be merged with the serial tests in place and we come back to this issue at lower priority? I don't think this should be a condition of merging this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok to let the one with stubbing run serially permanently, but I think this one belongs in the regular file. There's not any follow-up work to do--just trying to get them committed in the right place. 😄
secret["type"] = "something/invalid" | ||
end | ||
assert_deploy_failure(result) | ||
refute_logs_match(/Kubectl err:/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is too general. There are a handful of other kubectl commands that get run during the test, some of which might fail and be retried. Maybe that was a source of flakiness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 9e13b30
logger.level = 0 | ||
# An invalid PATCH produces the kind of error we want to catch, so first create a valid secret: | ||
assert_deploy_success(deploy_fixtures("hello-cloud", subset: %w(secret.yml))) | ||
# Then try to PATCH an immutable field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart! 👏
@@ -302,10 +302,10 @@ def split_templates(filename) | |||
raise FatalDeploymentError, "Failed to render and parse template" | |||
end | |||
|
|||
def record_invalid_template(err:, filename:, content:) | |||
def record_invalid_template(err:, filename:, content: nil) | |||
debug_msg = ColorizedString.new("Invalid template: #{filename}\n").red | |||
debug_msg += "> Error message:\n#{FormattedLogger.indent_four(err)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we also be suppressing (or replacing) the error itself? Just because it had a filename in it doesn't really tell us what else it contains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this? 97a4fd3
deployment["spec"]["template"]["spec"]["containers"].first["ports"].first["name"] = bad_port_name | ||
end | ||
assert_deploy_failure(result) | ||
refute_logs_match(/Kubectl err:/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as below--need to be more specific with this assertion or it will be flakey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 9e13b30
@KnVerey If I prepare a PR for those last review comments, would this get merged today? |
I'm in a bunch of meetings right now, but I'll do my best |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor stuff, but I'd be ok with this as is
README.md
Outdated
4. Add the a basic example of the type to the hello-cloud [fixture set](https://github.com/Shopify/kubernetes-deploy/tree/master/test/fixtures/hello-cloud) and appropriate assertions to `#assert_all_up` in [`hello_cloud.rb`](https://github.com/Shopify/kubernetes-deploy/blob/master/test/helpers/fixture_sets/hello_cloud.rb). This will get you coverage in several existing tests, such as `test_full_hello_cloud_set_deploy_succeeds`. | ||
5. Add tests for any edge cases you foresee. | ||
4. Add the new class to list of resources in | ||
[`deploy_task.rb`](https://github.com/Shopify/kubernetes-deploy/blob/6a0dd662735bbcc0c0cf110d049a08a044a07dd1/lib/kubernetes-deploy/deploy_task.rb#L8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason these don't point to master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #438
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason these don't point to master?
I used a specific commit so the line numbers don't rot, not a huge deal.
lib/kubernetes-deploy/deploy_task.rb
Outdated
warn_msg = "WARNING: Any resources not mentioned in the error(s) below were likely created/updated. " \ | ||
"You may wish to roll back this deploy." | ||
@logger.summary.add_paragraph(ColorizedString.new(warn_msg).yellow) | ||
|
||
unidentified_errors = [] | ||
sensitive_filenames = resources.select(&:kubectl_output_is_sensitive?).map { |r| File.basename(r.file_path) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sensitive_filenames
-> filenames_with_sensitive_content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #438
@@ -21,6 +21,7 @@ def assert_all_up | |||
assert_stateful_set_up | |||
assert_job_up | |||
assert_network_policy_up | |||
assert_secret_created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why _created
and not _present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to avoid confusing this with the more generic method of the same name in the superclass.
…ew-feedback Secrets as resources: more review feedback
Looks like this is ready 💃 |
Thank you to everyone involved. This change is very important for my use case and I'm very grateful for it getting to master ❤️ |
@benlangfeld if you need this immediately, can you use a git ref to reference it from master for a few days? I'd really like to include #415 in the next release too; it's very nearly ready but Tim was off today. |
Absolutely. I also need #421 , but at least I can now rebase that on less of a moving target. |
Ref: #421 #411
Closes: #209
What are you trying to accomplish with this PR?
Have ejson secrets provisioned like any other resource, rather than separately/specially in
EjsonSecretProvisioner
. This PR also adds support for deploying secrets from yaml templates.How is this accomplished?
The individual commits tell the story, it might be easier to review this PR by stepping through them individually.
What could go wrong?
A scary-ish change in this PR is adding secrets to the prune whitelist on deploy task. We were pruning secrets before of course, but in a more targeted way. Some questions I have:
Still tbd after this: support for adding labels from ejson, and resolving how to 'template' ejson secrets. This PR is big enough already.