Skip to content

helm: add wait initContainer and preStop hooks#20106

Merged
hugoShaka merged 12 commits intomasterfrom
hugo/helm-wait-hooks
Jan 13, 2023
Merged

helm: add wait initContainer and preStop hooks#20106
hugoShaka merged 12 commits intomasterfrom
hugo/helm-wait-hooks

Conversation

@hugoShaka
Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka commented Jan 11, 2023

Part of RFD-096: managing the major upgrades safely

This PR's main purpose is to block proxies running a new Teleport major version from connecting to auth pods running an old Teleport version.

This PR contains three commits:

  • one adding initContainers and preStop hooks to the teleport-cluster Helm chart (initContainers were designed in RFD 096, preStop was a nice additoin coming from the wait PR)
  • one fixing a bug in the wait command (the DNS error was not properly unwrapped and not recognized as a DNS error)
  • one fixing missing override support on some auth Dpeloyment values. As a rule of thumb for future review, we should not use .Values directly and prefer using $auth and $proxy

@hugoShaka hugoShaka force-pushed the hugo/helm-wait-hooks branch from 2e0c3e4 to e29868d Compare January 11, 2023 23:37
@hugoShaka hugoShaka added the helm label Jan 11, 2023
@hugoShaka hugoShaka marked this pull request as ready for review January 11, 2023 23:44
Comment thread examples/chart/teleport-cluster/templates/auth/service-previous-version.yaml Outdated
Comment thread examples/chart/teleport-cluster/templates/auth/service-previous-version.yaml Outdated
lifecycle:
# waiting during preStop ensures no new request will hit the Terminating pod
# on clusters using kube-proxy (kube-proxy syncs the node iptables rules every 30s)
preStop:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could also use this to have proxies unregister themselves on shutdown in future to mitigate #20057

Copy link
Copy Markdown
Contributor Author

@hugoShaka hugoShaka Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unregistering would be a nice improvement, but I'm not sure preStop is the best place because it is implementation-specific (Terraform/CloudFormation have the same issue), and we don't have control over the concurrency with preStop. Maybe the proxy will take 2 minutes to shut down, and I don't know what would happen if it gets unregistered while it has not finished doing its job. IMO the best place to do it would be in the graceful shutdown sequence.

Agendas are a little packed right now because of v12, but I'll raise this issue to folks who know more than me about registration and shutdown sequence once things settle down.

Comment thread examples/chart/teleport-cluster/values.yaml Outdated
Co-authored-by: Gus Luxton <gus@goteleport.com>
@hugoShaka hugoShaka enabled auto-merge (squash) January 13, 2023 00:38
@hugoShaka hugoShaka merged commit fb14caf into master Jan 13, 2023
@hugoShaka hugoShaka deleted the hugo/helm-wait-hooks branch January 13, 2023 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants