integrations/operator: re-use the teleport client instead of creating a new one#34050
Merged
integrations/operator: re-use the teleport client instead of creating a new one#34050
Conversation
hugoShaka
commented
Oct 30, 2023
strideynet
approved these changes
Oct 31, 2023
strideynet
requested changes
Oct 31, 2023
tigrato
reviewed
Oct 31, 2023
Contributor
There was a problem hiding this comment.
are we ever closing the client?
Contributor
There was a problem hiding this comment.
This is the leak I spotted - the problem also exists in the existing code. We can probably adjust this PR to fix this since the other content of the PR is still valuable.
Contributor
Author
There was a problem hiding this comment.
Fixed in 13de098a38a33c7890cc95bd6886c6a41fa8e4aa
This is not super clean, and we'll definitely want to get rid of this when tbot will send us clients with in-place cert-renewal. However, tbot changes won't be backported to v12/v13, so we need the current fix for those versions.
68243a8 to
13de098
Compare
strideynet
reviewed
Oct 31, 2023
360cf2b to
b3b6690
Compare
tigrato
reviewed
Nov 1, 2023
tigrato
approved these changes
Nov 2, 2023
b30cb50 to
ba32bcb
Compare
marcoandredinis
approved these changes
Nov 2, 2023
tigrato
reviewed
Nov 6, 2023
tigrato
approved these changes
Nov 6, 2023
d4bed68 to
90121a4
Compare
|
@hugoShaka See the table below for backport results.
|
hugoShaka
added a commit
that referenced
this pull request
Nov 9, 2023
… a new one (#34050) * integrations/operator: re-use the teleport client instead of creating a new one * fix race condition * address feedback + add godocs
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Nov 13, 2023
…eating a new one (#34431) * integrations/operator: re-use the teleport client instead of creating a new one (#34050) * integrations/operator: re-use the teleport client instead of creating a new one * fix race condition * address feedback + add godocs * fixup! integrations/operator: re-use the teleport client instead of creating a new one (#34050)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #24110
This PR addresses several major issues of the Teleport Operator:
This should the ongoing memory issues several users reported, should largely reduce the impact of broken reconciliation and reduce the memory spikes when reconciling many resources. Another PR from @tigrato will reduce the amount of unnecessary reconciliations. With both PRs we should be in a much better place in terms of CPU/memory load on the operator side.
How it works
With this PR the embedded tbot now caches the client and can skip the whole connection dance if the certs have not changed. In the future, tbot will return us a single client with rolling certificates, which will simplify the whole thing.
This PR also wraps the teleport client in a new structure that contains an RWLock and tracks who is using the client. This allows us to ensure no one is using the client before closing it.
Finally, this PR adds an RWLock on tbot's memory destination to ensure safe reads from the sidecar (we don't want to read while tbot is writing the renewed cert, this would end badly).
changelog: The operator reuses its connection to Teleport. Reduces CPU usage, logs, and fixes a memory leak.