Skip to content

Support running a version server in the proxy#35150

Merged
hugoShaka merged 1 commit intomasterfrom
hugo/add-proxy-version-server
Dec 12, 2023
Merged

Support running a version server in the proxy#35150
hugoShaka merged 1 commit intomasterfrom
hugo/add-proxy-version-server

Conversation

@hugoShaka
Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka commented Nov 29, 2023

Changelog: Support running a version server in the proxy for automatic agent upgrades

This PR adds an embedded version server in the proxy to address: https://github.com/gravitational/cloud/issues/6773

The version server can be configured through teleport.yaml:

proxy_service:
  enabled: "yes"
  automatic_upgrades_channels:
    stable/cloud:
      forward_url: https://updates.releases.teleport.dev/v1/stable/cloud
    preview/cloud:
      static_version: v12.5.4
    default:
      static_version: v13.3.0

The forwarded call results are cached for a minute.

For reviewers

To implement faster, I reused the logic from the kube-agent-updater. To do this I had to merge the updater go module in the toor teleport one (they were importing different versions of controller-runtime and kube libs). You might want to be careful about the go.mod changes brought by this PR, I've not noticed anything too suspicious.

Also, I'm surprised about the linter changing those files: 666fbf6

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 1, 2023

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

Copy link
Copy Markdown
Contributor

@bernardjkim bernardjkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I just have one question about future support for a Teleport client tools version server.

Comment thread lib/automaticupgrades/channel.go Outdated
Comment thread lib/web/apiserver.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 1, 2023

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

@hugoShaka hugoShaka force-pushed the hugo/add-proxy-version-server branch from 7acb18d to 0ac8a66 Compare December 4, 2023 15:20
Comment thread lib/automaticupgrades/channel.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to create another source that would return the Proxy's version?
I would imagine it to be the default one for a lot of scenarios.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea but I think we would be conflicting with some goals from security: they are trying to hide the minor/bugfix versions from the ping endpoint to make it harder for an attacker to identify if the teleport instance is out-of-date and vulnerable.

If people from cloud and security are OK I'll gladly implement that, I think it would solve the needs of 99% of self-hosted users.

Comment thread lib/config/configuration.go Outdated
Comment thread lib/automaticupgrades/channel.go Outdated
Comment thread lib/automaticupgrades/channel.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Comment thread examples/chart/teleport-kube-agent/values.yaml Outdated
Comment thread integrations/kube-agent-updater/cmd/teleport-kube-agent-updater/main.go Outdated
Comment thread integrations/kube-agent-updater/pkg/constants/constants.go Outdated
Comment thread integrations/kube-agent-updater/pkg/version/errors.go Outdated
Comment thread lib/automaticupgrades/channel.go Outdated
Comment thread lib/service/servicecfg/proxy.go Outdated
Comment thread lib/web/apiserver.go Outdated
Comment thread lib/web/apiserver.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we only consider major versions?
There are several cases where if auth server is behind agent version we can loose audit events because auth cannot/doesn't know how to unmarshal them from proto oneof.

We should always enforce proxies/auth to be the highest version (major.minor.patch) before upgrading agents. Sometimes we also use the same approach to release some features during minor upgrades

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request from cloud folks was to explicitly block major versions to unlock their automatic update rollout. I think that running a higher bugfix version of the agents might be valid, at least, it should be compatible according to our compatibility guarantees.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I think this will bring more problems than solutions. The fact that agents can run multiple versions of the auth server (but always on the same major version) gives me the creeps.
There is nothing that guarantees that the auth server is not running 14.0.0 and the agents are not running 14.3.17 and this can be a big problem since we can have huge losses in audit logs. The same applies when agents report to prehog (discovery_service does this, machineid does too)

For the cloud it won't be a big problem but....

cc @rosstimothy do you have any concern about this or are you ok with it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @tigrato, though our docs do suggest that any component is compatible within a patch and minor version, the scale team's position has been that the Auth server should never be running a lower version than any other component in the cluster. In addition to missing events as previously mentioned, having agents running newer versions than Auth can lead to very subtle issues and partial functionality that may increase support load.

Copy link
Copy Markdown
Contributor Author

@hugoShaka hugoShaka Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not about the auth server version, it is about unblocking cloud automatic upgrades rollout because some tenants with mixed workloads block the whole cloud from upgrading.

I understand that you want to add new version restrictions, but I don't think this is the right place. We currently have no restrictions and this change is adding a small restriction on the proxy major version to avoid some tenants from breaking. The goal is not to ensure version compatibility across agents, proxy and auth. Cloud was already responsible for this and will continue to be.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrato @rosstimothy Like Hugo mentioned, this is just a supporting endpoint to unblock our ability to resume bumping stable/cloud version. Also, while I agree to some extent that ideally auth should always be newer, that is not what's documented in our official compatibility guarantees. Not that it particularly matters in this case because like Hugo said our Cloud deployment process always makes sure that agents are auto-updated after control plane.

So this should not cause any issues, I propose we merge this because our inability to bump stable/cloud for past 2 months already caused a ton of issues.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested this because there is already a version check and version limiting. Doing the same for the auth server is trivial since each proxy has access to the auth server version in memory.

I'm not going to block the PR but I think we should have more guarantees that everything works when we decide to make upgrades.

@hugoShaka hugoShaka force-pushed the hugo/add-proxy-version-server branch from b05df32 to 923288f Compare December 6, 2023 15:43
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 6, 2023

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 6, 2023

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 6, 2023

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 6, 2023

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

Comment thread integration/proxy/automaticupragdes_test.go Outdated
Comment thread integrations/kube-agent-updater/pkg/version/basichttp.go Outdated
Comment thread lib/automaticupgrades/channel.go Outdated
Comment thread lib/automaticupgrades/channel.go Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a pointer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better because this allows to use the same Channel for multiple channel names and reuse the same cache. I'll use this property in a subsequent PR.

Comment thread lib/automaticupgrades/channel.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Comment thread lib/web/automaticupgrades.go Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it actually be empty at this point? We're checking that reqParts has at least 2 elements above. Or is this to check for // in the path or something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, if you hit /webapi/automaticupgrades/channel//version you have a reqParts = ["", "version"] and channelName = ""

@public-teleport-github-review-bot public-teleport-github-review-bot Bot removed the request for review from sclevine December 6, 2023 19:55
Comment thread lib/web/automaticupgrades.go Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrato @rosstimothy Like Hugo mentioned, this is just a supporting endpoint to unblock our ability to resume bumping stable/cloud version. Also, while I agree to some extent that ideally auth should always be newer, that is not what's documented in our official compatibility guarantees. Not that it particularly matters in this case because like Hugo said our Cloud deployment process always makes sure that agents are auto-updated after control plane.

So this should not cause any issues, I propose we merge this because our inability to bump stable/cloud for past 2 months already caused a ton of issues.

@hugoShaka hugoShaka force-pushed the hugo/add-proxy-version-server branch 2 times, most recently from abad7eb to cc8d429 Compare December 7, 2023 23:34
This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773

The version server can be configured through `teleport.yaml`:

```yaml
proxy_service:
  enabled: "yes"
  automatic_upgrades_channels:
    stable/cloud:
      forward_url: https://updates.releases.teleport.dev/v1/stable/cloud
    preview/cloud:
      static_version: v12.5.4
```

The forwarded call results are cached for a minute.
@hugoShaka hugoShaka force-pushed the hugo/add-proxy-version-server branch from cc8d429 to eb76306 Compare December 12, 2023 00:44
@hugoShaka hugoShaka enabled auto-merge December 12, 2023 00:44
@hugoShaka hugoShaka added this pull request to the merge queue Dec 12, 2023
Merged via the queue into master with commit 856bb39 Dec 12, 2023
@hugoShaka hugoShaka deleted the hugo/add-proxy-version-server branch December 12, 2023 01:22
@public-teleport-github-review-bot
Copy link
Copy Markdown

@hugoShaka See the table below for backport results.

Branch Result
branch/v13 Failed
branch/v14 Failed

bernardjkim pushed a commit that referenced this pull request Dec 19, 2023
This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773

The version server can be configured through `teleport.yaml`:

```yaml
proxy_service:
  enabled: "yes"
  automatic_upgrades_channels:
    stable/cloud:
      forward_url: https://updates.releases.teleport.dev/v1/stable/cloud
    preview/cloud:
      static_version: v12.5.4
```

The forwarded call results are cached for a minute.
bernardjkim pushed a commit that referenced this pull request Jan 3, 2024
This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773

The version server can be configured through `teleport.yaml`:

```yaml
proxy_service:
  enabled: "yes"
  automatic_upgrades_channels:
    stable/cloud:
      forward_url: https://updates.releases.teleport.dev/v1/stable/cloud
    preview/cloud:
      static_version: v12.5.4
```

The forwarded call results are cached for a minute.
bernardjkim pushed a commit that referenced this pull request Jan 5, 2024
This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773

The version server can be configured through `teleport.yaml`:

```yaml
proxy_service:
  enabled: "yes"
  automatic_upgrades_channels:
    stable/cloud:
      forward_url: https://updates.releases.teleport.dev/v1/stable/cloud
    preview/cloud:
      static_version: v12.5.4
```

The forwarded call results are cached for a minute.
github-merge-queue Bot pushed a commit that referenced this pull request Jan 8, 2024
* Add a version server in the proxy + use it in agent chart (#35150)

This PR adds an embedded [version server](https://goteleport.com/docs/architecture/agent-update-management/#version-server-and-source-of-truth) in the proxy to address: gravitational/cloud#6773

The version server can be configured through `teleport.yaml`:

```yaml
proxy_service:
  enabled: "yes"
  automatic_upgrades_channels:
    stable/cloud:
      forward_url: https://updates.releases.teleport.dev/v1/stable/cloud
    preview/cloud:
      static_version: v12.5.4
```

The forwarded call results are cached for a minute.

* automatic upgrades: use default version channel everywhere (#35342)

* Use default upgrade channel

This commit:
- initializes default upgrade channels based on the server features
- makes all integrations use the upgrade channels instead of hitting
  hardcoded s3 bucket
- makes the version channel return its own version if the target
  version is too high
- makes the NoVersion handler properly: returned as an error. This way
  soneone relying on the version getter doesn't have to check
- moves the version kube-agent-updater lib in main teleport libs
- add tests for noVersion channels

* Update lib/web/join_tokens.go

Co-authored-by: Bernard Kim <bernard@goteleport.com>

* address marco's feedback

* address marco's feedback pt.2

---------

Co-authored-by: Bernard Kim <bernard@goteleport.com>

* Fix teleport.e integrations builds (#35996)

* Move automaticupgrades packages in `lib/automaticupgrades`

* Fix `kube-agent-udpater` Dockerfile

* Write handler config (#35998)

* go mod tidy

* Bump controller-runtime v0.16.3

* Use channel

---------

Co-authored-by: Hugo Shaka <hugo.hervieux@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants