Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment rollout monitoring revamp #36

Closed
KnVerey opened this issue Feb 27, 2017 · 1 comment · Fixed by #107
Closed

Deployment rollout monitoring revamp #36

KnVerey opened this issue Feb 27, 2017 · 1 comment · Fixed by #107
Assignees
Labels
🪲 bug Something isn't working

Comments

@KnVerey
Copy link
Contributor

KnVerey commented Feb 27, 2017

The fact that deployment rollout monitoring currently looks at all pods associated with that deployment instead of pods in the new ReplicaSet has caused several different bugs:

  • Deploys never succeed when there are evicted pods associated with the deployment, even though those pods are old (fixed another way)
  • Pod warnings get shown for pods that are being shut down, in the case where the last deploy was bad and the current one is actually succeeding (very confusing output)
  • Deploy success is delayed by waiting for all old pods to fully disappear
  • This false-positive deploy result is likely caused by this. Last poll before it "succeeded" was: (I now think it probably briefly became available before failing a probe, or something like that)
[KUBESTATUS] {"group":"Deployments","name":"jobs","status_string":"Waiting for rollout to finish: 0 of 1 updated replicas are available...","exists":true,"succeeded":true,"failed":false,"timed_out":false,"replicas":{"updatedReplicas":1,"replicas":1,"availableReplicas":1},"num_pods":1}

Related technical notes:

  • We are selecting related pods using an assumption that they are labelled with the deployment name. I believe this is a bad assumption that has flown under the radar so far because all our templates are labelled this way by convention. The new ReplicaSet version should not do this.
  • Here's how kubectl gets old/new rs
@KnVerey KnVerey added the 🪲 bug Something isn't working label Feb 27, 2017
@KnVerey KnVerey self-assigned this Feb 27, 2017
@KnVerey
Copy link
Contributor Author

KnVerey commented Apr 5, 2017

  • We should base the timeout on progressDeadlineSeconds when present.
  • With k8s 1.6, we should select pod based on the new ownerRef field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪲 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant