Support running a version server in the proxy#35150
Conversation
|
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
bernardjkim
left a comment
There was a problem hiding this comment.
Looks good to me! I just have one question about future support for a Teleport client tools version server.
|
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
7acb18d to
0ac8a66
Compare
There was a problem hiding this comment.
Would it make sense to create another source that would return the Proxy's version?
I would imagine it to be the default one for a lot of scenarios.
There was a problem hiding this comment.
This is a good idea but I think we would be conflicting with some goals from security: they are trying to hide the minor/bugfix versions from the ping endpoint to make it harder for an attacker to identify if the teleport instance is out-of-date and vulnerable.
If people from cloud and security are OK I'll gladly implement that, I think it would solve the needs of 99% of self-hosted users.
There was a problem hiding this comment.
should we only consider major versions?
There are several cases where if auth server is behind agent version we can loose audit events because auth cannot/doesn't know how to unmarshal them from proto oneof.
We should always enforce proxies/auth to be the highest version (major.minor.patch) before upgrading agents. Sometimes we also use the same approach to release some features during minor upgrades
There was a problem hiding this comment.
The request from cloud folks was to explicitly block major versions to unlock their automatic update rollout. I think that running a higher bugfix version of the agents might be valid, at least, it should be compatible according to our compatibility guarantees.
There was a problem hiding this comment.
As discussed offline, I think this will bring more problems than solutions. The fact that agents can run multiple versions of the auth server (but always on the same major version) gives me the creeps.
There is nothing that guarantees that the auth server is not running 14.0.0 and the agents are not running 14.3.17 and this can be a big problem since we can have huge losses in audit logs. The same applies when agents report to prehog (discovery_service does this, machineid does too)
For the cloud it won't be a big problem but....
cc @rosstimothy do you have any concern about this or are you ok with it?
There was a problem hiding this comment.
I agree with @tigrato, though our docs do suggest that any component is compatible within a patch and minor version, the scale team's position has been that the Auth server should never be running a lower version than any other component in the cluster. In addition to missing events as previously mentioned, having agents running newer versions than Auth can lead to very subtle issues and partial functionality that may increase support load.
There was a problem hiding this comment.
This PR is not about the auth server version, it is about unblocking cloud automatic upgrades rollout because some tenants with mixed workloads block the whole cloud from upgrading.
I understand that you want to add new version restrictions, but I don't think this is the right place. We currently have no restrictions and this change is adding a small restriction on the proxy major version to avoid some tenants from breaking. The goal is not to ensure version compatibility across agents, proxy and auth. Cloud was already responsible for this and will continue to be.
There was a problem hiding this comment.
@tigrato @rosstimothy Like Hugo mentioned, this is just a supporting endpoint to unblock our ability to resume bumping stable/cloud version. Also, while I agree to some extent that ideally auth should always be newer, that is not what's documented in our official compatibility guarantees. Not that it particularly matters in this case because like Hugo said our Cloud deployment process always makes sure that agents are auto-updated after control plane.
So this should not cause any issues, I propose we merge this because our inability to bump stable/cloud for past 2 months already caused a ton of issues.
There was a problem hiding this comment.
I suggested this because there is already a version check and version limiting. Doing the same for the auth server is trivial since each proxy has access to the auth server version in memory.
I'm not going to block the PR but I think we should have more guarantees that everything works when we decide to make upgrades.
b05df32 to
923288f
Compare
|
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
|
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
2 similar comments
|
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
|
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
There was a problem hiding this comment.
Does this need to be a pointer?
There was a problem hiding this comment.
It's better because this allows to use the same Channel for multiple channel names and reuse the same cache. I'll use this property in a subsequent PR.
There was a problem hiding this comment.
Can it actually be empty at this point? We're checking that reqParts has at least 2 elements above. Or is this to check for // in the path or something?
There was a problem hiding this comment.
yeah, if you hit /webapi/automaticupgrades/channel//version you have a reqParts = ["", "version"] and channelName = ""
There was a problem hiding this comment.
@tigrato @rosstimothy Like Hugo mentioned, this is just a supporting endpoint to unblock our ability to resume bumping stable/cloud version. Also, while I agree to some extent that ideally auth should always be newer, that is not what's documented in our official compatibility guarantees. Not that it particularly matters in this case because like Hugo said our Cloud deployment process always makes sure that agents are auto-updated after control plane.
So this should not cause any issues, I propose we merge this because our inability to bump stable/cloud for past 2 months already caused a ton of issues.
abad7eb to
cc8d429
Compare
This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773 The version server can be configured through `teleport.yaml`: ```yaml proxy_service: enabled: "yes" automatic_upgrades_channels: stable/cloud: forward_url: https://updates.releases.teleport.dev/v1/stable/cloud preview/cloud: static_version: v12.5.4 ``` The forwarded call results are cached for a minute.
cc8d429 to
eb76306
Compare
|
@hugoShaka See the table below for backport results.
|
This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773 The version server can be configured through `teleport.yaml`: ```yaml proxy_service: enabled: "yes" automatic_upgrades_channels: stable/cloud: forward_url: https://updates.releases.teleport.dev/v1/stable/cloud preview/cloud: static_version: v12.5.4 ``` The forwarded call results are cached for a minute.
This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773 The version server can be configured through `teleport.yaml`: ```yaml proxy_service: enabled: "yes" automatic_upgrades_channels: stable/cloud: forward_url: https://updates.releases.teleport.dev/v1/stable/cloud preview/cloud: static_version: v12.5.4 ``` The forwarded call results are cached for a minute.
This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773 The version server can be configured through `teleport.yaml`: ```yaml proxy_service: enabled: "yes" automatic_upgrades_channels: stable/cloud: forward_url: https://updates.releases.teleport.dev/v1/stable/cloud preview/cloud: static_version: v12.5.4 ``` The forwarded call results are cached for a minute.
* Add a version server in the proxy + use it in agent chart (#35150) This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773 The version server can be configured through `teleport.yaml`: ```yaml proxy_service: enabled: "yes" automatic_upgrades_channels: stable/cloud: forward_url: https://updates.releases.teleport.dev/v1/stable/cloud preview/cloud: static_version: v12.5.4 ``` The forwarded call results are cached for a minute. * automatic upgrades: use default version channel everywhere (#35342) * Use default upgrade channel This commit: - initializes default upgrade channels based on the server features - makes all integrations use the upgrade channels instead of hitting hardcoded s3 bucket - makes the version channel return its own version if the target version is too high - makes the NoVersion handler properly: returned as an error. This way soneone relying on the version getter doesn't have to check - moves the version kube-agent-updater lib in main teleport libs - add tests for noVersion channels * Update lib/web/join_tokens.go Co-authored-by: Bernard Kim <bernard@goteleport.com> * address marco's feedback * address marco's feedback pt.2 --------- Co-authored-by: Bernard Kim <bernard@goteleport.com> * Fix teleport.e integrations builds (#35996) * Move automaticupgrades packages in `lib/automaticupgrades` * Fix `kube-agent-udpater` Dockerfile * Write handler config (#35998) * go mod tidy * Bump controller-runtime v0.16.3 * Use channel --------- Co-authored-by: Hugo Shaka <hugo.hervieux@goteleport.com>
Changelog: Support running a version server in the proxy for automatic agent upgrades
This PR adds an embedded version server in the proxy to address: https://github.com/gravitational/cloud/issues/6773
The version server can be configured through
teleport.yaml:The forwarded call results are cached for a minute.
For reviewers
To implement faster, I reused the logic from the
kube-agent-updater. To do this I had to merge the updater go module in the toor teleport one (they were importing different versions of controller-runtime and kube libs). You might want to be careful about the go.mod changes brought by this PR, I've not noticed anything too suspicious.Also, I'm surprised about the linter changing those files: 666fbf6