You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After a bit of time with a successful but empty cluster, the autoscaler kicked in and killed 1 or the 3 workers.
The node is no longer shown when doing kubeclt get nodes.
The problem is, the worker node is stuck as DeletingNode which can be seen from thousands of events along the lines of:
Deleting Node ip-10-56-0-138.ec2.internal because it's not present according to cloud provider
Example:
$ kubectl get events
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
3s 6h 4780 ip-10-56-0-138.ec2.internal Node Normal DeletingNode controllermanager Node ip-10-56-0-138.ec2.internal event: Deleting Node ip-10-56-0-138.ec2.internal because it's not present according to cloud provider
(note: count: 4780!)
Checking the configmap that the autoscaler creates shows the worker node that was removed is still somehow registered. i.e.
Is there a problem with the autoscaler? Is it supposed to unregister the node or is this normal?
Is there a way I can get more info about why DeletingNode event is appearing so often. There must be a reason for the node not able to be fully deleted. At one point, a stateful set put a pv and pvc on the worker that was deleted - I'm not sure if this could cause a issue with it being unregistered. The pv and pvc were manually removed with no luck curbing the continuing DeletingNode event stream.
Sorry if this issue is not appropriate. Feel free to remove if this is the case. ( It's hard to tell if it could be a bug with the autoscaler or just my use-case.)
The config map in full:
$ kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
apiVersion: v1
data:
status: |+
Cluster-autoscaler status at 2017-06-08 17:30:00.417692456 +0000 UTC:
Cluster-wide:
Health: Healthy (ready=5 unready=0 notStarted=0 longNotStarted=0 registered=6)
LastProbeTime: 2017-06-08 17:29:59.812893761 +0000 UTC
LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
ScaleUp: NoActivity (ready=5 registered=6)
LastProbeTime: 2017-06-08 17:29:59.812893761 +0000 UTC
LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2017-06-08 17:30:00.119227722 +0000 UTC
LastTransitionTime: 2017-06-08 10:46:54.809754422 +0000 UTC
NodeGroups:
Name: worker-general-test
Health: Healthy (ready=2 unready=0 notStarted=0 longNotStarted=0 registered=2 cloudProviderTarget=2 (minSize=1, maxSize=5))
LastProbeTime: 2017-06-08 17:29:59.812893761 +0000 UTC
LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
ScaleUp: NoActivity (ready=2 cloudProviderTarget=2)
LastProbeTime: 2017-06-08 17:29:59.812893761 +0000 UTC
LastTransitionTime: 2017-06-08 10:26:35.872670968 +0000 UTC
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2017-06-08 17:30:00.119227722 +0000 UTC
LastTransitionTime: 2017-06-08 10:46:54.809754422 +0000 UTC
kind: ConfigMap
metadata:
annotations:
cluster-autoscaler.kubernetes.io/last-updated: 2017-06-08 17:30:00.417692456 +0000
UTC
creationTimestamp: 2017-06-08T10:26:25Z
name: cluster-autoscaler-status
namespace: kube-system
resourceVersion: "60900"
selfLink: /api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status
uid: ed1780d0-4c34-11e7-bb12-0afa88f15a64
The text was updated successfully, but these errors were encountered:
I installed a Kubernetes cluster on AWS and CoreOS hosts with Tack and the cluster-autoscaler is included as a add-on. This is the yaml they use: https://github.com/kz8s/tack/blob/master/addons/autoscaler/cluster-autoscaler.yml (uses v0.5.2)
After a bit of time with a successful but empty cluster, the autoscaler kicked in and killed 1 or the 3 workers.
The node is no longer shown when doing
kubeclt get nodes
.The problem is, the worker node is stuck as
DeletingNode
which can be seen from thousands of events along the lines of:Example:
(note: count: 4780!)
Checking the configmap that the autoscaler creates shows the worker node that was removed is still somehow registered. i.e.
Is there a problem with the autoscaler? Is it supposed to unregister the node or is this normal?
Is there a way I can get more info about why
DeletingNode
event is appearing so often. There must be a reason for the node not able to be fully deleted. At one point, a stateful set put a pv and pvc on the worker that was deleted - I'm not sure if this could cause a issue with it being unregistered. The pv and pvc were manually removed with no luck curbing the continuing DeletingNode event stream.Sorry if this issue is not appropriate. Feel free to remove if this is the case. ( It's hard to tell if it could be a bug with the autoscaler or just my use-case.)
The config map in full:
The text was updated successfully, but these errors were encountered: