Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add native sidecar support #11465

Merged
merged 22 commits into from
Nov 22, 2023
Merged

Add native sidecar support #11465

merged 22 commits into from
Nov 22, 2023

Conversation

teejaded
Copy link
Contributor

@teejaded teejaded commented Oct 6, 2023

Kubernetes has introduced native sidecar support in version 1.28. This feature improves network proxy sidecar compatability for jobs and initContainers.

Introduce a new annotation config.alpha.linkerd.io/proxy-enable-native-sidecar and configuration option Proxy.NativeSidecar that causes the proxy container to run as an init-container.

Fixes: #11461

@teejaded teejaded requested a review from a team as a code owner October 6, 2023 23:59
@wmorgan
Copy link
Member

wmorgan commented Oct 7, 2023

Very cool to see this! Thank you. We will take a look.

@teejaded
Copy link
Contributor Author

I think we might want to control the ordering of the initContainers. Right now, as this is existing initContainers can execute before the proxy starts up.

❯ kubectl get pod init-test-84d9f97845-s5p5j -o yaml | yq .status.initContainerStatuses
- containerID: containerd://6c37dc9562ccf6265b77a42c7076eaf75a8fe444cf119967cb86464eb5ed2062
  image: docker.io/curlimages/curl:latest
  imageID: docker.io/curlimages/curl@sha256:961cf9e2a1939ea380b3f16e313a581b5d4681dd9dc4b1ace060eb396a71df0d
  lastState: {}
  name: curl
  ready: true
  restartCount: 0
  started: false
  state:
    terminated:
      containerID: containerd://6c37dc9562ccf6265b77a42c7076eaf75a8fe444cf119967cb86464eb5ed2062
      exitCode: 0
      finishedAt: "2023-10-10T19:29:16Z"
      reason: Completed
      startedAt: "2023-10-10T19:29:16Z"
- containerID: containerd://68d680f701ea0663c6157044700c673284712b78205bcd0c7fa66b541592339a
  image: cr.l5d.io/linkerd/proxy-init:v2.2.1
  imageID: cr.l5d.io/linkerd/proxy-init@sha256:20349a461f9fb76fde33741a90f9de2a647068f506325ac5e0faf7b7bc2eea72
  lastState: {}
  name: linkerd-init
  ready: true
  restartCount: 0
  started: false
  state:
    terminated:
      containerID: containerd://68d680f701ea0663c6157044700c673284712b78205bcd0c7fa66b541592339a
      exitCode: 0
      finishedAt: "2023-10-10T19:29:18Z"
      reason: Completed
      startedAt: "2023-10-10T19:29:17Z"
- containerID: containerd://5095f1a207dc80e4c7ded02c87d27a88661a009b55b4c0e7670e532f8bdacc4b
  image: cr.l5d.io/linkerd/proxy:stable-2.14.0
  imageID: cr.l5d.io/linkerd/proxy@sha256:8bdf507a00c154b3fa0916acc98d80bfdcb0fb8ca1fa3d1e12cbba82604ef23e
  lastState: {}
  name: linkerd-proxy
  ready: true
  restartCount: 0
  started: true
  state:
    running:
      startedAt: "2023-10-10T19:29:19Z"

Copy link
Contributor

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need to consider the shutdown logic as well. Happy to sync up and explain more when I am back from ooo, I spent a lot of time on this for Istio

@kflynn
Copy link
Member

kflynn commented Oct 18, 2023

Hey @howardjohn! Definitely happy to chat further, I'll hit you up on Slack to sort out a time, thanks!

@howardjohn
Copy link
Contributor

Getting back to this.

So here is what we are doing in Istio. I think its optimal and applies to Linkerd, but happy to improve it if now

Startup:

  • Run init container for iptables setup
  • Run "sidecar" container (init with Restart=Always).
    Use StartupProbe. Ideally with interval=1s to optimize startup time.

Shutdown:
In Istio we have a "drain" phase where we accept new connections but bias them away (connection=close, GOAWAY). Not sure if you have similar.

Starting in 1.29, on pod termination, Sidecars will not get a SIGTERM. Instead, their preStop hook is called.
In istio, we use this to trigger the draining (via an HTTP call).

Eventually, the app containers will shutdown. At this point, sidecar is useless.
Kubelet sends a SIGTERM, and we exit ~immediately. If you have to flush stats or anything, this would be a good time, but no need to wait around for users connections.

The shutdown part changes in 1.29, not yet merged though. Before in 1.28, the SIGTERM is on pod shutdown start ,and no preStop hook

@kflynn
Copy link
Member

kflynn commented Oct 18, 2023

@howardjohn Thanks, appreciate it!

@kflynn
Copy link
Member

kflynn commented Oct 19, 2023

(For the record: the Linkerd team is doing some internal research on this; more news as events warrant. 🙂)

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @teejaded for tackling this; the general approach looks solid to me 👍

Before we're able to test it, the addition of sidecars into the control-plane components needs to be addressed. Unlike proxies in the data-plane that are dynamically injected by the injector webhook, the control-plane proxies are added from the get-go via direct inclusion of the partials.proxy partial template. So they're still getting added as regular containers, and given the restartPolicy: Always is not allowed there, these workloads are getting rejected. Fixing this should be a matter of just moving the inclusion of partials.proxy into the init containers section.

As for your comment about controlling the init containers ordering, I think the spirit of the sidecars KEP is to serve as auxiliary containers for the non-init containers, so let's limit ourselves to that case. Currently linkerd doesn't proxy traffic for init containers, so it's not like we'd be introducing a regression here.

Can you please declare the new proxy.nativeSidecar setting into the values.yaml file, defaulting to false? Please add it with a comment explaining:

  • this is an experimental feature
  • it requires k8s >= 1.28
  • it requires the k8s scheduler feature gate SidecarContainers to be enabled
  • The setting .proxy.waitBeforeExitSeconds is discarded in this mode (because the lack of native sidecar support for preStop hooks). Similarly, It'd be good also to add a warning in the proxy.waitBeforeExistSeconds explaining it gets ignored when proxy.nativeSidecar is enabled.

That comment will get picked up in the chart's README.md file when running ./bin/helm-docs

Also, about the new golden files introduced, I think you can reuse the ones we already have.

Finally, could you also add the new annotation config.alpha.linkerd.io/proxy-enable-native-sidecar into the file ./cli/cmd/doc.go so it gets picked up by the website's reference docs?

I'll come back with more comments, once we're able to give this a try.
For reference, this is the k3d command I'm using to bring up an adequate cluster, and the linkerd commands to turn the feature on:

k3d cluster create --image=+v1.28 --k3s-arg '--kube-scheduler-arg=feature-gates=SidecarContainers=true@server:*'
bin/linkerd install --crds | kubectl apply -f -
bin/linkerd install --set proxy.nativeSidecar=true | kubectl apply -f -

@howardjohn Thanks a lot of your input, it wasn't clear to me from the KEP doc that the preStop hook isn't enacted in 1.28, so this tip saves me some pain. I'd like to give this a detailed test once the control plane issue mentioned above is addressed and see if we need to further iterate on the shutdown workflow, at least just on the k8s manifests side of things for now.

@teejaded
Copy link
Contributor Author

@alpeb

Are you certain about the initContainer ordering? I set config.linkerd.io/default-inbound-policy: all-authenticated on a namespace (which is when I discovered that default authorizations were not working for initContainers) and confirmed that init-containers cannot access other pods unless I put the proxy before other init-containers.

This is one of the test cases called out by Istio in their blog announcing support for the new feature. https://istio.io/latest/blog/2023/native-sidecars/#init-container-traffic

Using native sidecars seemed like a bad fit for the control plane pods so I've forced it off in that scenario. The proxy startupProbe can not succeed when the control plane is down which causes a crashloop. We could omit the startupProbe when rendering those, but it seems cleaner not to use the feature for that right now.

@alpeb
Copy link
Member

alpeb commented Oct 25, 2023

Thanks for the quick turnaround @teejaded.

Are you certain about the initContainer ordering? I set config.linkerd.io/default-inbound-policy: all-authenticated on a namespace (which is when I discovered that default authorizations were not working for initContainers) and confirmed that init-containers cannot access other pods unless I put the proxy before other init-containers.

Ok fair enough. I see you're already injecting linkerd-init and linkerd-proxy at the beginning 👍 Also, nice fix in the policy controller 👍

Using native sidecars seemed like a bad fit for the control plane pods so I've forced it off in that scenario. The proxy startupProbe can not succeed when the control plane is down which causes a crashloop. We could omit the startupProbe when rendering those, but it seems cleaner not to use the feature for that right now.

For proxies to become ready they only require to acquire their identity with the linkerd-identity controller. So only in the case of the linkerd-identity pod does the main container (linkerd-identity) need to run before the proxy. In that case we should not use a native sidecar. For the others (linkerd-proxy-injector and linkerd-destination) we can enable it. But their proxy is gonna start very quick, before linkerd-identity is ready to serve, so you need to give it extra slack with a startup probe with a higher initialDelaySeconds to avoid pod restarts. This means you'll have to parametrize that value as well, defaulting to zero in all but this case.

@alpeb
Copy link
Member

alpeb commented Nov 3, 2023

Hi @teejaded, just reaching out to check if need any assistance. The rust audit failure should be gone once you rebase. And the other two failures are likely flakes.

@howardjohn
Copy link
Contributor

howardjohn commented Nov 3, 2023 via email

Kubernetes has introduced native sidecar support in version 1.28.  This feature improves network proxy sidecar compatability for jobs and initContainers.

Introduce a new annotation config.alpha.linkerd.io/proxy-enable-native-sidecar and configuration option Proxy.NativeSidecar that causes the proxy container to run as an init-container.

Fixes: linkerd#11461

Signed-off-by: TJ Miller <[email protected]>
@teejaded
Copy link
Contributor Author

teejaded commented Nov 3, 2023

@alpeb

Rebased. Rust audit job works locally now via nektos/act.

Is there an advantage to running native sidecars in the control plane?

@alpeb
Copy link
Member

alpeb commented Nov 3, 2023

Is there an advantage to running native sidecars in the control plane?

Just the general improvements that this new pattern brings: proper shutdown ordering without resorting to hacks, ability to keep pod network up in case other logging sidecars require it, avoid the proxy using the default container slot...

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @teejaded, I think we're almost there from my perspective.
I've been testing a lot the control plane startup flow, and we were still getting probes failures. In the identity controller, now that the proxy starts first, it tries to reach the main container from the get go and ends up in a failfast state which delays the pod startup 10s more. And this further impacts the probes and startup times for the other controllers. The only way I could resolve this was to not use a native sidecar for identity. For the other controllers I bumped a bit the startup prob initial delay, as shown below.

charts/partials/templates/_proxy.tpl Outdated Show resolved Hide resolved
charts/linkerd-control-plane/templates/identity.yaml Outdated Show resolved Hide resolved
charts/linkerd-control-plane/templates/identity.yaml Outdated Show resolved Hide resolved
charts/linkerd-control-plane/templates/identity.yaml Outdated Show resolved Hide resolved
charts/linkerd-control-plane/templates/proxy-injector.yaml Outdated Show resolved Hide resolved
charts/linkerd-control-plane/templates/destination.yaml Outdated Show resolved Hide resolved
charts/linkerd-control-plane/values.yaml Outdated Show resolved Hide resolved
cli/cmd/doc.go Outdated Show resolved Hide resolved
controller/webhook/util.go Outdated Show resolved Hide resolved
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one tiny more thing to go, but this looks good to me 👍
I think once k3d adds support for k8s 1.29 we can add an integration test, probably just a smoke test, as a separate PR of course. @teejaded you're welcome to take on that effort if you like! 🙂

cli/cmd/doc.go Outdated Show resolved Hide resolved
@bartwitkowski
Copy link

Hey @alpeb and @teejaded,
So this native sidecar feature (which we are very much waiting for) will require K8s >= 1.29 and not 1.28?
If so, that is pitty, because we would have to wait another 4 months to make it available in AKS GA (AKS version 1.28 is only few days old) :-(.

Apart from that, looking at the new native sidecar container code both K8s versions are mentioned:
K8s 1.28 ProxyEnableNativeSidecarAnnotation
K8s 1.29 proxy.nativeSidecar

I'm confused...

@alpeb
Copy link
Member

alpeb commented Nov 15, 2023

@bartwitkowski You can use it in k8s 1.28 but I wouldn't recommend it. It is disabled by default; you need to enable it with the SidecarContainers feature flag (is that possible in AKS?) and proper shutdown ordering isn't implemented, which means the proxy won't wait for the main containers to finish before it shuts down.
As for both versions being mentioned, yep, that still needs to be addressed here.

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome :D
Excited to get this merged

httpGet:
path: /ready
port: {{.Values.proxy.ports.admin}}
initialDelaySeconds: {{.Values.proxy.startupProbeInitialDelaySeconds | default 0}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these variables should be added to values.yaml. This will allow us to set default values in values.yaml instead of needing to do defaulting inline in the template like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline defaults are for injection because I didn't plumb the values into pkg/charts/linkerd2.Proxy.

I'm happy to do this, but I think this would expose those settings to users to modify which we discussed above as being out of scope or perhaps undesired. Please let me know if I'm mistaken.

@@ -1,4 +1,5 @@
{{ $prefix := .Values.pathPrefix -}}
{{ $initIndex := ternary "0" "-" (dig "proxy" "nativeSidecar" false (merge (dict) .Values)) -}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have a template comment here explaining this. if I understand correctly, the reasoning is that in native sidecar mode, we can put our containers first so that other init containers can take advantage of the proxy but without native sidecar, the proxy-init and network-validator init containers must be last to ensure there are no containers that start after iptables has been configured but before the proxy has started. in other words, the proxy-init, network-validator, and proxy contains must be started consecutively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a comment is needed. My goal here was partially to preserve the current behavior when nativeSidecar was not enabled since it's entirely possible someone is relying on the order of the containers in a kubectl patch, kustomize or something.

@@ -62,14 +63,14 @@
},
{
"op": "add",
"path": "{{$prefix}}/spec/initContainers/-",
"path": "{{$prefix}}/spec/initContainers/{{$initIndex}}{{$initIndex = add1 $initIndex}}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when add1 is called on "-"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It silently becomes 1. I thought it was an acceptable quirk since in that case the variable is not used again.

https://helm-playground.com/#t=LQhQG92ACASBDaAuAvNARMd0C%2BPQCmAHvALYAOANgUtJHIgD7QCOArgPYAuBu%2BkMBNDTwAJqICMDPoRIVqAJlr0heUEA&v=LQhQqA

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome :D excited to get this merged

@bartwitkowski
Copy link

@alpeb after reading hurray-optymistics William Morgan article about this, I was hoping to implement it soon-ish. It turns out that even after 3 months it is not done on the Linkerd side and it is really alpha feature in K8s. That's a pitty 😞

But well, 🤞 that around GA in K8s 1.30 and L5d 1.26 (is there a roadmap for this feature?) we could use it :-).

PS. Now we have 2 projects using L5d. One is simple and there is all ok, and the other one is big and complicated and there we have many problem with implementation... Almost all would be fixed by this feature.

@alpeb
Copy link
Member

alpeb commented Nov 20, 2023

But well, 🤞 that around GA in K8s 1.30 and L5d 1.26 (is there a roadmap for this feature?) we could use it :-).

As soon as this merges it'll get included in an edge, and then in stable-2.15.0

@alpeb
Copy link
Member

alpeb commented Nov 20, 2023

Thanks for the latest changes @teejaded. Can you also please update the description in cli/cmd/doc.go ?

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🎉 🎉

@alpeb alpeb merged commit 1b37e19 into linkerd:main Nov 22, 2023
42 checks passed
@hawkw hawkw mentioned this pull request Nov 22, 2023
hawkw added a commit that referenced this pull request Nov 22, 2023
## edge-23.11.4

This edge release introduces support for the native sidecar containers
entering beta support in Kubernetes 1.29. This improves the startup and
shutdown ordering for the proxy relative to other containers, fixing the
long-standing shutdown issue with injected `Job`s. Furthermore, traffic
from other `initContainer`s can now be proxied by Linkerd.

In addition, this edge release includes Helm chart improvements, and
improvements to the multicluster extension.

* Added a new `config.alpha.linkerd.io/proxy-enable-native-sidecar`
  annotation and `Proxy.NativeSidecar` Helm option that causes the proxy
  container to run as an init-container (thanks @teejaded!) (#11465;
  fixes #11461)
* Fixed broken affinity rules for the multicluster `service-mirror` when
  running in HA mode (#11609; fixes #11603)
* Added a new check to `linkerd check` that ensures all extension
  namespaces are configured properly (#11629; fixes #11509)
* Updated the Prometheus Docker image used by the `linkerd-viz`
  extension to v2.48.0, resolving a number of CVEs in older Prometheus
  versions (#11633)
* Added `nodeAffinity` to `deployment` templates in the `linkerd-viz`
  and `linkerd-jaeger` Helm charts (thanks @naing2victor!) (#11464;
  fixes #10680)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement KEP753 initContainer restartPolicy=Always sidecar
7 participants