Skip to content

Do not request resources in teleport-cluster config hooks#61856

Merged
hugoShaka merged 1 commit intomasterfrom
hugo/fix-auth-config-job-requests
Dec 2, 2025
Merged

Do not request resources in teleport-cluster config hooks#61856
hugoShaka merged 1 commit intomasterfrom
hugo/fix-auth-config-job-requests

Conversation

@hugoShaka
Copy link
Copy Markdown
Contributor

Config check hooks don't run a full Teleport, they are only validate that the config file is valid. Applying resoucre requests is not useful and can block the rollout in smaller clusters.

This is a papercut I faced when doing a large Teleport deployment in resource-constrainted AKS clusters.

Changelog: Prevented stuck teleport-cluster Helm chart rollouts in small Kubernetes clusters. Removed resource requests from configuration check hooks.

@webvictim
Copy link
Copy Markdown
Contributor

webvictim commented Dec 1, 2025

Looks like there's a failing test to remove too:

- should set resources on auth predeploy job when set in values

Config check hooks don't run a full Teleport, they are only validate
that the config file is valid. Applying resoucre requests is not useful
and can block the rollout in smaller clusters.

Changelog: Prevented stuck `teleport-cluster` Helm chart rollouts in smaller clusters. Removed resource requests from configuration check hooks.
@hugoShaka hugoShaka force-pushed the hugo/fix-auth-config-job-requests branch from a192c35 to f0c42f1 Compare December 2, 2025 23:30
@hugoShaka hugoShaka enabled auto-merge December 2, 2025 23:30
@hugoShaka hugoShaka added this pull request to the merge queue Dec 2, 2025
Merged via the queue into master with commit b30719b Dec 2, 2025
43 checks passed
@hugoShaka hugoShaka deleted the hugo/fix-auth-config-job-requests branch December 2, 2025 23:49
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@hugoShaka See the table below for backport results.

Branch Result
branch/v17 Create PR
branch/v18 Create PR

21KennethTran pushed a commit that referenced this pull request Jan 6, 2026
Config check hooks don't run a full Teleport, they are only validate
that the config file is valid. Applying resoucre requests is not useful
and can block the rollout in smaller clusters.

Changelog: Prevented stuck `teleport-cluster` Helm chart rollouts in smaller clusters. Removed resource requests from configuration check hooks.
@programmerq
Copy link
Copy Markdown
Contributor

programmerq commented Jan 12, 2026

@hugoShaka For clusters that have policies that require resources always be set, this will prevent deployments from working. We'll definitely want to make this change be an opt-out behavior rather than off for everywhere.

See also: #62597

@hugoShaka
Copy link
Copy Markdown
Contributor Author

hugoShaka commented Jan 12, 2026

Thank you, I did not think of such compliance tools. I can revert the change in v17 but I don't think we should bring the old resource behaviour back, requesting 4cpu and 16GiB of memory to run a config linter is absurd and breaks on small clusters.

What I can do is:

  • revert the change in v17 to revert the breaking change
  • add a jobResource field in v18 & master
  • set jobResource to a small default, like 10mcore and 100MiB of memory

@programmerq
Copy link
Copy Markdown
Contributor

Yes, a separate job resource setting would be great! I think 100MiB might be too small for a default though. I did a quick test and saw the maximum RSS for teleport configure --test command was 158 MiB with Teleport 18.2.2 and 202 MiB with Teleport 18.6.2 using the same teleport.yaml file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants