Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

If k8s odr tasks are pending, they will never get cleaned up #2641

Closed
izaaklauer opened this issue Nov 1, 2021 · 1 comment · Fixed by #3143
Closed

If k8s odr tasks are pending, they will never get cleaned up #2641

izaaklauer opened this issue Nov 1, 2021 · 1 comment · Fixed by #3143
Labels
bug Something isn't working plugin/k8s
Milestone

Comments

@izaaklauer
Copy link
Contributor

izaaklauer commented Nov 1, 2021

Describe the bug

If k8s tasks do not complete (or error), they will stick around forever.

Steps to Reproduce

  • Enable ODR on k8s waypoint, and app and/or project polling
  • Scale down the worker pool (or otherwise exhaust some resource)
  • Observe as waypoint continues to spawn pending tasks forever, which will thundering herd the cluster when you add additional capacity

Expected behavior

This is because we aren't ever calling stop on k8s tasks:

// Purposely do nothing. We leverage the job TTL feature in Kube 1.19+

This is actually really nice, because without this there's no way to inspect the logs of a failed poll-based task. Maybe we can call stop after some delay period, or otherwise store the logs somewhere.

Waypoint Platform Versions
Additional version and platform information to help triage the issue if
applicable:

  • Waypoint CLI Version: 0.6.1
  • Waypoint Server Platform and Version: 0.6.1
  • Waypoint Plugin: k8s

Additional context

Slack thread: https://hashicorp.slack.com/archives/C013QT1KG9W/p1635804287310700

@izaaklauer izaaklauer added new bug Something isn't working and removed new labels Nov 1, 2021
@izaaklauer
Copy link
Contributor Author

A nice solution: it would be nice if we left errored pods around so we can easily look at the logs from the k8s layer.

Two good points from @catsby:

  • It might not always be important to keep the logs around, because sophisticated users will probably all pod logs shipped somewhere (like elasticsearch) that they can look at after the fact.
  • We can probably leave errored pods around, because I think Kubernetes has some kind of auto-reaping policy for errored pods.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working plugin/k8s
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants