-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have the Webhook react to pod creation/update only #2472
Conversation
7b3b5cb
to
2ab8edd
Compare
This is ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is my environment, but getting startup failures following auto-injection:
$ kubectl -n emojivoto logs -f pod/vote-bot-644ddf8596-n44tz linkerd-proxy
ERR! linkerd2_proxy::transport::tls::identity Invalid DNS name: [46, 112, 111, 100, 46, 101, 109, 111, 106, 105, 118, 111, 116, 111, 46, 108, 105, 110, 107, 101, 114, 100, 45, 109, 97, 110, 97, 103, 101, 100, 46, 108, 105, 110, 107, 101, 114, 100, 46, 115, 118, 99, 46, 99, 108, 117, 115, 116, 101, 114, 46, 108, 111, 99, 97, 108]
configuration error: InvalidEnvVar
steps to reproduce
bin/linkerd install --proxy-auto-inject --tls optional | kubectl apply -f -
use this gist:
curl https://gist.githubusercontent.com/siggy/867ae15d7e1c48c0d6cb2fb4f92f24d5/raw/5c4f5738e56b19f14dc42faa3773c1da72fae6d5/emojivoto-auto-inject.yml | kubectl apply -f -
Diffing the output of kubectl -n emojivoto get all -oyaml
against master reveals a few notable differences:
this branch
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: emoji-token-pjdtg
readOnly: true
...
- name: LINKERD2_PROXY_ID
value: .pod.$LINKERD2_PROXY_POD_NAMESPACE.linkerd-managed.linkerd.svc.cluster.local
master
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-rz8qb
readOnly: true
...
- name: LINKERD2_PROXY_ID
value: emoji.deployment.$LINKERD2_PROXY_POD_NAMESPACE.linkerd-managed.linkerd.svc.cluster.local
...
# this block only present on master
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-rz8qb
readOnly: true
...
# this block only present on master
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-rz8qb
readOnly: true
If this is particularly difficult to fix, I should note that the |
Thanks for the detailed check @siggy! (I'll have to get into the habit of testing things with TLS enabled...) I tried fixing the As for the missing serviceaccount mount paths, I believe the cause is as follows: TL;DR: The injected sidecar + initContainer don't pass through the Service Account admission controller, so we have to manually add the missing mount paths. The Service Account admission controller is the one in charge of automatically adding those mounts when a pod enters the system, and that's always run before our mutating admission webhook. When we were dealing with Deployments, we injected our sidecar and initContainer into the Deployment spec, then that created a Pod which went through the Service Account admission controller and got its mount paths set. Now that we listen to Pod creation only, the Pod we receive in the webhook has already gone through the Service Account admission controller, which is confirmed by the mount path being correctly set in the uninjected container. We then proceed to inject our containers, but the Pod won't go again through the Service Account admission controller, so the mount path for our new containers won't get set. As a solution I will be manually adding those mount paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm still seeing:
$ kubectl -n emojivoto logs -f pod/emoji-7cdf7ff5d-9d94q linkerd-proxy
ERR! linkerd2_proxy::transport::tls::identity Invalid DNS name: [101, 109, 111, 106, 105, 45, 55, 99, 100, 102, 55, 102, 102, 53, 100, 45, 46, 112, 111, 100, 46, 101, 109, 111, 106, 105, 118, 111, 116, 111, 46, 108, 105, 110, 107, 101, 114, 100, 45, 109, 97, 110, 97, 103, 101, 100, 46, 108, 105, 110, 107, 101, 114, 100, 46, 115, 118, 99, 46, 99, 108, 117, 115, 116, 101, 114, 46, 108, 111, 99, 97, 108]
configuration error: InvalidEnvVar
Decoding those bytes yields:
emoji-7cdf7ff5d-.pod.emojivoto.linkerd-managed.linkerd.svc.cluster.local
...maybe the trailing hyphen is a problem?
- name: LINKERD2_PROXY_ID
value: emoji-7cdf7ff5d-.pod.$LINKERD2_PROXY_POD_NAMESPACE.linkerd-managed.linkerd.svc.cluster.local
Following up on our conversation, I've disabled |
497e284
to
687de9b
Compare
I've just rebased now that the |
687de9b
to
967a8e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, glad to see that this is straightforward to setup. I've suggested a different approach below for finding pod owners, so that we don't have to implement owner reference parsing twice in the codebase, but let me know if my comments don't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just one question.
One TIOLI: I notice there is still a One suggestion is to just replace the two constructors with a more general |
@ihcsim yeah, |
In addition to the issue with auto-injecting single pod without labels, there is another minor one. The log
I think the |
Actually |
967a8e1
to
17c6f85
Compare
Thank you guys for all your feedback. Here's a new push with everything addressed. @klingerf I'm now using @ihcsim Given the payload received by the webhook is a little different in regards to metadata, I had to get rid of the Sorry I had to rebase one more time, to keep in sync with the Identity changes. |
This was already working almost out-of-the-box, just had to: - Change the webhook config so it watches pods instead of deployments - Grant some extra ClusterRole permissions - Add the piece that figures what's the OwnerReference and add the label for it Fixes #2342 and #1751 Signed-off-by: Alejandro Pedraza <[email protected]>
Signed-off-by: Alejandro Pedraza <[email protected]>
Signed-off-by: Alejandro Pedraza <[email protected]>
Signed-off-by: Alejandro Pedraza <[email protected]>
Signed-off-by: Alejandro Pedraza <[email protected]>
17c6f85
to
43edcab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for rebasing with master
. Some comments below.
pkg/inject/inject.go
Outdated
@@ -362,10 +355,20 @@ func (conf *ResourceConfig) complete(template *v1.PodTemplateSpec) { | |||
conf.pod.Meta = &template.ObjectMeta | |||
} | |||
|
|||
// GetPod returns the parsed object as a *v1.Pod variable | |||
func (conf *ResourceConfig) GetPod() (*v1.Pod, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed if you moved the logic of webhook.getLabelForParent()
into this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inject.go
is not dependent on k8s.API
, which is needed in getLabelForParent()
. That's why I kept the pod object retrieval logic separated from the logic that retrieves its ownership, which is only used in the webhook.
@@ -27,21 +27,25 @@ func newReport(conf *ResourceConfig) Report { | |||
var name string | |||
if m := conf.pod.Meta; m != nil { | |||
name = m.Name | |||
if name == "" { | |||
name = m.GenerateName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how useful this will be, as it only returns the prefix of the pod name like linkerd-controller-7c6b8c586b-
. Will there be cases where the pod name will be empty? The parent workload name, if non-empty, might be more useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod name appears to be always empty in the admission request metadata, and we only have access to that GenerateName
, which is actually the ReplicaSet's name (plus a dash). I think this is a bit more informative (linkerd-controller-7c6b8c586b-
) than the Deployment's name (linkerd-controller
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for switching to k8s.API. Approach looks good to me, but I was a bit confused by the handling of the namespace in the pod meta -- see my comments below. Also had just a few other random bits of feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all works on my machine. Had mostly questions...
Signed-off-by: Alejandro Pedraza <[email protected]>
return pkgK8s.ProxyStatefulSetLabel, name, nil | ||
} | ||
return "", "", fmt.Errorf("unsupported parent kind \"%s\"", kind) | ||
func (w *Webhook) ownerRetriever(ns string) inject.OwnerRetrieverFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who said go cannot haz curry? 🐱
@@ -154,158 +153,6 @@ func TestGetPatch(t *testing.T) { | |||
}) | |||
} | |||
|
|||
func TestGetLabelForParent(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we no longer have GetLabelForParent()
I removed this test, but created #2560 to complete the webhook tests, that we can address in the 2.3 testing sprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐️ Looks great! Thanks for making those updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the updates! 👍 🚢
Signed-off-by: Alejandro Pedraza <[email protected]>
Integration test results for 122bdb2: success 🎉 |
This branch cherry-picks #2472 onto `release/v2.203`. Currently, the proxy [depends on an outdated version of `rustls`][1], v0.20.8. The `rustls` dependency is via our dependency on `tokio-rustls` v0.23.4; we don't have a direct `rustls` dependency, in order to ensure that the version of `rustls` is always the same version as used by `tokio-rustls`. `rustls` also has a dependency on `webpki`, and v0.20.x of `rustls` uses the original `webpki` crate, rather than the `rustls-webpki` crate. So, unfortunately, because we have a transitive dep on `webpki` via `rustls`, PR linkerd/linkerd2-proxy#2465 did not remove _all_ `webpki` deps from our dependency tree, only the direct dependency. This branch updates to `rustls` v0.21.x, which depends on `rustls-webpki` rather than `webpki`, removing the `webpki` dependency. This is accomplished by updating `tokio-rustls` to v0.24.x, implicitly updating the transitive `rustls` dep. In order to update to the semver-incompatible version of `rustls`, it was necessary to modify our code in order to track some breaking API changes. I've also added a `cargo-deny` ban for `webpki` to our `deny.toml`, to ensure that we always use the actively-maintained `rustls-webpki` crate rather than `webpki` classic. Since peer certificate validation is performed through `rustls` rather than through the direct `rustls-webpki` dependency, this should hopefully resolve issues with issuer certs that contain name constraints --- these were not fixed by linkerd/linkerd2-proxy#2465, because the failure with certs containing name constraints occurred inside of the *`webpki` version depended on by `rustls`*, rather than inside of the proxy's direct dep. See [this comment][2] for details. In addition, it was necessary to update `rustls-webpki` to v0.101.6, since v0.101.5 was yanked due to an accidental API breaking change. <details> <summary>Verifying that we no longer depend on `webpki`:</summary> Before: ```console $ cargo tree -p webpki -i webpki v0.22.1 ├── rustls v0.20.8 │ └── tokio-rustls v0.23.4 │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ └── linkerd-meshtls-rustls v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/meshtls/rustls) │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) │ │ ├── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) │ │ │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ │ │ └── linkerd2-proxy v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd2-proxy) │ │ ├── linkerd-app-admin v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/admin) │ │ │ └── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) (*) │ │ │ [dev-dependencies] │ │ │ └── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ │ └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) │ │ └── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) (*) │ │ [dev-dependencies] │ │ └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*) │ ├── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) │ │ ├── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) (*) │ │ └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*) │ │ [dev-dependencies] │ │ └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*) │ └── linkerd-meshtls v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/meshtls) │ ├── linkerd-app-core v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/core) │ │ ├── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) (*) │ │ ├── linkerd-app-admin v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/admin) (*) │ │ ├── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*) │ │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*) │ │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ │ ├── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*) │ │ └── linkerd-app-test v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/test) │ │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*) │ │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ │ └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*) │ │ [dev-dependencies] │ │ ├── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*) │ │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*) │ │ └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*) │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*) │ ├── linkerd-proxy-tap v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/proxy/tap) │ │ └── linkerd-app-core v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/core) (*) │ └── linkerd2-proxy v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd2-proxy) │ [dev-dependencies] │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*) │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*) │ [dev-dependencies] │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*) │ └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*) └── tokio-rustls v0.23.4 (*) ``` After: ```console $ cargo tree -p webpki -i error: package ID specification `webpki` did not match any packages ``` </details> [1]: https://github.com/linkerd/linkerd2-proxy/blob/8afc72258b8ced868fbd0bde0235955c0adf4ccd/Cargo.lock#L2450-L2460C2 [2]: #9299 (comment) --- 0e843c9f meshtls: update to `rustls` v0.21.7 (linkerd/linkerd2-proxy#2473) Signed-off-by: Eliza Weisman <[email protected]>
This branch cherry-picks #2472 onto `release/v2.203`. Currently, the proxy [depends on an outdated version of `rustls`][1], v0.20.8. The `rustls` dependency is via our dependency on `tokio-rustls` v0.23.4; we don't have a direct `rustls` dependency, in order to ensure that the version of `rustls` is always the same version as used by `tokio-rustls`. `rustls` also has a dependency on `webpki`, and v0.20.x of `rustls` uses the original `webpki` crate, rather than the `rustls-webpki` crate. So, unfortunately, because we have a transitive dep on `webpki` via `rustls`, PR linkerd/linkerd2-proxy#2465 did not remove _all_ `webpki` deps from our dependency tree, only the direct dependency. This branch updates to `rustls` v0.21.x, which depends on `rustls-webpki` rather than `webpki`, removing the `webpki` dependency. This is accomplished by updating `tokio-rustls` to v0.24.x, implicitly updating the transitive `rustls` dep. In order to update to the semver-incompatible version of `rustls`, it was necessary to modify our code in order to track some breaking API changes. I've also added a `cargo-deny` ban for `webpki` to our `deny.toml`, to ensure that we always use the actively-maintained `rustls-webpki` crate rather than `webpki` classic. Since peer certificate validation is performed through `rustls` rather than through the direct `rustls-webpki` dependency, this should hopefully resolve issues with issuer certs that contain name constraints --- these were not fixed by linkerd/linkerd2-proxy#2465, because the failure with certs containing name constraints occurred inside of the *`webpki` version depended on by `rustls`*, rather than inside of the proxy's direct dep. See [this comment][2] for details. In addition, it was necessary to update `rustls-webpki` to v0.101.6, since v0.101.5 was yanked due to an accidental API breaking change. [1]: https://github.com/linkerd/linkerd2-proxy/blob/8afc72258b8ced868fbd0bde0235955c0adf4ccd/Cargo.lock#L2450-L2460C2 [2]: #9299 (comment) --- 0e843c9f meshtls: update to `rustls` v0.21.7 (linkerd/linkerd2-proxy#2473) Signed-off-by: Eliza Weisman <[email protected]>
This branch cherry-picks #2472 onto `release/v2.203`. Currently, the proxy [depends on an outdated version of `rustls`][1], v0.20.8. The `rustls` dependency is via our dependency on `tokio-rustls` v0.23.4; we don't have a direct `rustls` dependency, in order to ensure that the version of `rustls` is always the same version as used by `tokio-rustls`. `rustls` also has a dependency on `webpki`, and v0.20.x of `rustls` uses the original `webpki` crate, rather than the `rustls-webpki` crate. So, unfortunately, because we have a transitive dep on `webpki` via `rustls`, PR linkerd/linkerd2-proxy#2465 did not remove _all_ `webpki` deps from our dependency tree, only the direct dependency. This branch updates to `rustls` v0.21.x, which depends on `rustls-webpki` rather than `webpki`, removing the `webpki` dependency. This is accomplished by updating `tokio-rustls` to v0.24.x, implicitly updating the transitive `rustls` dep. In order to update to the semver-incompatible version of `rustls`, it was necessary to modify our code in order to track some breaking API changes. I've also added a `cargo-deny` ban for `webpki` to our `deny.toml`, to ensure that we always use the actively-maintained `rustls-webpki` crate rather than `webpki` classic. Since peer certificate validation is performed through `rustls` rather than through the direct `rustls-webpki` dependency, this should hopefully resolve issues with issuer certs that contain name constraints --- these were not fixed by linkerd/linkerd2-proxy#2465, because the failure with certs containing name constraints occurred inside of the *`webpki` version depended on by `rustls`*, rather than inside of the proxy's direct dep. See [this comment][2] for details. In addition, it was necessary to update `rustls-webpki` to v0.101.6, since v0.101.5 was yanked due to an accidental API breaking change. [1]: https://github.com/linkerd/linkerd2-proxy/blob/8afc72258b8ced868fbd0bde0235955c0adf4ccd/Cargo.lock#L2450-L2460C2 [2]: #9299 (comment) --- 0e843c9f meshtls: update to `rustls` v0.21.7 (linkerd/linkerd2-proxy#2473) Signed-off-by: Eliza Weisman <[email protected]>
This was already working almost out-of-the-box, just had to:
for it
Tested fine when applying new yamls for Deployments and StatefulSets.
Fixes #2342 and #1751