Skip to content
This repository was archived by the owner on Oct 24, 2023. It is now read-only.

metrics-server is broken in 1.10.10, 1.10.11, 1.12.3, 1.13.0 #73

Closed
jackfrancis opened this issue Nov 29, 2018 · 18 comments · Fixed by #178
Closed

metrics-server is broken in 1.10.10, 1.10.11, 1.12.3, 1.13.0 #73

jackfrancis opened this issue Nov 29, 2018 · 18 comments · Fixed by #178
Labels
bug Something isn't working

Comments

@jackfrancis
Copy link
Member

Let's fix this!

@jackfrancis jackfrancis added the bug Something isn't working label Nov 29, 2018
@jackfrancis
Copy link
Member Author

$ kubectl describe hpa
Name:                                                  php-apache-long-running
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Wed, 28 Nov 2018 16:36:05 -0800
Reference:                                             Deployment/php-apache-long-running
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 5%
Min replicas:                                          1
Max replicas:                                          10
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Events:
  Type     Reason                        Age               From                       Message
  ----     ------                        ----              ----                       -------
  Warning  FailedGetResourceMetric       6s (x11 over 5m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  6s (x11 over 5m)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API

@jackfrancis
Copy link
Member Author

$ kubectl logs heapster-75f5b5c44c-blv2k -n kube-system -c heapster
I1129 00:31:20.330019       1 heapster.go:78] /heapster --source=kubernetes.summary_api:''
I1129 00:31:20.330201       1 heapster.go:79] Heapster version v1.5.1
I1129 00:31:20.330899       1 configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
I1129 00:31:20.330923       1 configs.go:62] Using kubelet port 10255
I1129 00:31:20.332252       1 heapster.go:202] Starting with Metric Sink
I1129 00:31:20.526184       1 heapster.go:112] Starting heapster on port 8082
W1129 00:32:25.022174       1 manager.go:152] Failed to get all responses in time (got 0/3)
W1129 00:33:25.000493       1 manager.go:152] Failed to get all responses in time (got 0/3)
E1129 00:34:15.950448       1 manager.go:101] Error in scraping containers from kubelet_summary:10.240.0.33:10255: Get http://10.240.0.33:10255/stats/summary/: dial tcp 10.240.0.33:10255: getsockopt: connection timed out

@jackfrancis jackfrancis changed the title HPA is broken in 1.11.5 HPA is broken in 1.10.11 Nov 29, 2018
@jackfrancis
Copy link
Member Author

On a working 1.10.9 cluster we don't see the 10255 errors:

$ kubectl logs heapster-75f5b5c44c-wh7ph -n kube-system -c heapster
I1129 17:57:41.308987       1 heapster.go:78] /heapster --source=kubernetes.summary_api:''
I1129 17:57:41.309032       1 heapster.go:79] Heapster version v1.5.1
I1129 17:57:41.309494       1 configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
I1129 17:57:41.309518       1 configs.go:62] Using kubelet port 10255
I1129 17:57:41.310839       1 heapster.go:202] Starting with Metric Sink
I1129 17:57:41.408278       1 heapster.go:112] Starting heapster on port 8082
E1129 17:58:11.312374       1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
E1129 17:58:11.406214       1 reflector.go:190] k8s.io/heapster/metrics/processors/namespace_based_enricher.go:89: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
E1129 17:58:11.406229       1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
E1129 17:58:11.406291       1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
E1129 17:58:11.406300       1 reflector.go:190] k8s.io/heapster/metrics/heapster.go:328: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout

@CecileRobertMichon
Copy link
Contributor

tried bumping heapster to 1.5.3 but that didn't solve the problem

kubectl logs heapster-685c7dcc7c-vkf7h -n kube-system -c heapster
I1130 23:07:10.483675       1 heapster.go:78] /heapster --source=kubernetes.summary_api:''
I1130 23:07:10.483742       1 heapster.go:79] Heapster version v1.5.3
I1130 23:07:10.484116       1 configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
I1130 23:07:10.484144       1 configs.go:62] Using kubelet port 10255
I1130 23:07:10.485124       1 heapster.go:202] Starting with Metric Sink
I1130 23:07:10.568383       1 heapster.go:112] Starting heapster on port 8082
E1130 23:07:40.486874       1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
E1130 23:07:40.487699       1 reflector.go:190] k8s.io/heapster/metrics/heapster.go:328: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
E1130 23:07:40.487702       1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
E1130 23:07:40.566808       1 reflector.go:190] k8s.io/heapster/metrics/processors/namespace_based_enricher.go:89: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
E1130 23:07:40.566883       1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout
E1130 23:08:12.211218       1 manager.go:101] Error in scraping containers from kubelet_summary:10.240.0.95:10255: Get http://10.240.0.95:10255/stats/summary/: dial tcp 10.240.0.95:10255: getsockopt: no route to host

@CecileRobertMichon
Copy link
Contributor

further investigation suggests this is related to heapster ClusterRole lacking priviledges kubernetes-retired/heapster#1936. Adding nodes/stats didn't fix the issue. Trying with create.

@CecileRobertMichon
Copy link
Contributor

create didn't help either.

@CecileRobertMichon
Copy link
Contributor

Metrics-server logs show:

 error while getting metrics summary from Kubelet k8s-agentpool1-21511156-vmss000001(10.240.0.65:10255): Get http://10.240.0.65:10255/stats/summary/: dial tcp 10.240.0.65:10255: getsockopt: no route to host
W1201 00:30:09.309236       1 manager.go:102] Failed to get kubelet_summary:10.240.0.65:10255 response in time
I1201 00:30:09.769890       1 reststorage.go:140] No metrics for container php-apache-long-running in pod default/php-apache-long-running-774fddb9d6-hhccn

@CecileRobertMichon
Copy link
Contributor

kubectl top nodes also not working

@CecileRobertMichon
Copy link
Contributor

Changing heapster source to --source=kubernetes.summary_api:''?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250&insecure=true didn't help either

@CecileRobertMichon
Copy link
Contributor

Changing metrics server to --source=kubernetes.summary_api:''?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250&insecure=true as well didn't fix the issue

@CecileRobertMichon CecileRobertMichon changed the title HPA is broken in 1.10.11 HPA is broken in 1.10.10, 1.10.11, 1.12.3 Dec 1, 2018
@CecileRobertMichon
Copy link
Contributor

CecileRobertMichon commented Dec 1, 2018

With

command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
kubectl logs metrics-server-f695b845-dmtnx -n kube-system
I1201 18:55:06.774972       1 heapster.go:71] /metrics-server --source=kubernetes.summary_api:''
I1201 18:55:06.775013       1 heapster.go:72] Metrics Server version v0.2.1
I1201 18:55:06.775133       1 configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version
I1201 18:55:06.775142       1 configs.go:62] Using kubelet port 10255
I1201 18:55:06.775980       1 heapster.go:128] Starting with Metric Sink
I1201 18:55:07.054235       1 serving.go:308] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
I1201 18:55:07.377880       1 heapster.go:101] Starting Heapster API server...
[restful] 2018/12/01 18:55:07 log.go:33: [restful/swagger] listing is available at https:///swaggerapi
[restful] 2018/12/01 18:55:07 log.go:33: [restful/swagger] https:///swaggerui/ is mapped to folder /swagger-ui/
I1201 18:55:07.381278       1 serve.go:85] Serving securely on 0.0.0.0:443
I1201 18:55:11.018606       1 reststorage.go:93] No metrics for pod default/php-apache-long-running-774fddb9d6-649mk
I1201 18:55:41.052838       1 reststorage.go:93] No metrics for pod default/php-apache-long-running-774fddb9d6-649mk
I1201 18:56:11.066080       1 reststorage.go:93] No metrics for pod default/php-apache-long-running-774fddb9d6-649mk

kubernetes-sigs/metrics-server#131 (comment)

@CecileRobertMichon
Copy link
Contributor

Confirmed "--read-only-port": "10255", (Azure/acs-engine#4307) doesn't fix the issue either

@CecileRobertMichon
Copy link
Contributor

CecileRobertMichon commented Dec 1, 2018

From top nodes:

kubectl top nodes --heapster-service=metrics-server --heapster-port= --heapster-namespace=kube-system --heapster-scheme= --v=10
I1201 13:13:05.441801   91354 loader.go:359] Config loaded from file _output/kubernetes-westus2-28471/kubeconfig/kubeconfig.westus2.json
I1201 13:13:05.443829   91354 loader.go:359] Config loaded from file _output/kubernetes-westus2-28471/kubeconfig/kubeconfig.westus2.json
I1201 13:13:05.444173   91354 round_trippers.go:386] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.1 (darwin/amd64) kubernetes/4ed3216" 'https://kubernetes-westus2-28471.westus2.cloudapp.azure.com/api?timeout=32s'
I1201 13:13:05.681981   91354 round_trippers.go:405] GET https://kubernetes-westus2-28471.westus2.cloudapp.azure.com/api?timeout=32s 200 OK in 237 milliseconds
I1201 13:13:05.682064   91354 round_trippers.go:411] Response Headers:
I1201 13:13:05.682123   91354 round_trippers.go:414]     Content-Length: 134
I1201 13:13:05.682180   91354 round_trippers.go:414]     Date: Sat, 01 Dec 2018 21:13:05 GMT
I1201 13:13:05.682190   91354 round_trippers.go:414]     Content-Type: application/json
I1201 13:13:05.682380   91354 request.go:942] Response Body: {"kind":"APIVersions","versions":["v1"],"serverAddressByClientCIDRs":[{"clientCIDR":"0.0.0.0/0","serverAddress":"10.255.255.5:443"}]}
I1201 13:13:05.682746   91354 round_trippers.go:386] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.1 (darwin/amd64) kubernetes/4ed3216" 'https://kubernetes-westus2-28471.westus2.cloudapp.azure.com/apis?timeout=32s'
I1201 13:13:05.709601   91354 round_trippers.go:405] GET https://kubernetes-westus2-28471.westus2.cloudapp.azure.com/apis?timeout=32s 200 OK in 26 milliseconds
I1201 13:13:05.709630   91354 round_trippers.go:411] Response Headers:
I1201 13:13:05.709636   91354 round_trippers.go:414]     Content-Type: application/json
I1201 13:13:05.709642   91354 round_trippers.go:414]     Content-Length: 3865
I1201 13:13:05.709647   91354 round_trippers.go:414]     Date: Sat, 01 Dec 2018 21:13:05 GMT
I1201 13:13:05.710848   91354 request.go:942] Response Body: {"kind":"APIGroupList","apiVersion":"v1","groups":[{"name":"apiregistration.k8s.io","versions":[{"groupVersion":"apiregistration.k8s.io/v1","version":"v1"},{"groupVersion":"apiregistration.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"apiregistration.k8s.io/v1","version":"v1"}},{"name":"extensions","versions":[{"groupVersion":"extensions/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"extensions/v1beta1","version":"v1beta1"}},{"name":"apps","versions":[{"groupVersion":"apps/v1","version":"v1"},{"groupVersion":"apps/v1beta2","version":"v1beta2"},{"groupVersion":"apps/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"apps/v1","version":"v1"}},{"name":"events.k8s.io","versions":[{"groupVersion":"events.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"events.k8s.io/v1beta1","version":"v1beta1"}},{"name":"authentication.k8s.io","versions":[{"groupVersion":"authentication.k8s.io/v1","version":"v1"},{"groupVersion":"authentication.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"authentication.k8s.io/v1","version":"v1"}},{"name":"authorization.k8s.io","versions":[{"groupVersion":"authorization.k8s.io/v1","version":"v1"},{"groupVersion":"authorization.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"authorization.k8s.io/v1","version":"v1"}},{"name":"autoscaling","versions":[{"groupVersion":"autoscaling/v1","version":"v1"},{"groupVersion":"autoscaling/v2beta1","version":"v2beta1"},{"groupVersion":"autoscaling/v2beta2","version":"v2beta2"}],"preferredVersion":{"groupVersion":"autoscaling/v1","version":"v1"}},{"name":"batch","versions":[{"groupVersion":"batch/v1","version":"v1"},{"groupVersion":"batch/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"batch/v1","version":"v1"}},{"name":"certificates.k8s.io","versions":[{"groupVersion":"certificates.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"certificates.k8s.io/v1beta1","version":"v1beta1"}},{"name":"networking.k8s.io","versions":[{"groupVersion":"networking.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"networking.k8s.io/v1","version":"v1"}},{"name":"policy","versions":[{"groupVersion":"policy/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"policy/v1beta1","version":"v1beta1"}},{"name":"rbac.authorization.k8s.io","versions":[{"groupVersion":"rbac.authorization.k8s.io/v1","version":"v1"},{"groupVersion":"rbac.authorization.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"rbac.authorization.k8s.io/v1","version":"v1"}},{"name":"storage.k8s.io","versions":[{"groupVersion":"storage.k8s.io/v1","version":"v1"},{"groupVersion":"storage.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"storage.k8s.io/v1","version":"v1"}},{"name":"admissionregistration.k8s.io","versions":[{"groupVersion":"admissionregistration.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"admissionregistration.k8s.io/v1beta1","version":"v1beta1"}},{"name":"apiextensions.k8s.io","versions":[{"groupVersion":"apiextensions.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"apiextensions.k8s.io/v1beta1","version":"v1beta1"}},{"name":"scheduling.k8s.io","versions":[{"groupVersion":"scheduling.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"scheduling.k8s.io/v1beta1","version":"v1beta1"}},{"name":"coordination.k8s.io","versions":[{"groupVersion":"coordination.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"coordination.k8s.io/v1beta1","version":"v1beta1"}},{"name":"metrics.k8s.io","versions":[{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}}]}
I1201 13:13:05.711401   91354 round_trippers.go:386] curl -k -v -XGET  -H "User-Agent: kubectl/v1.12.1 (darwin/amd64) kubernetes/4ed3216" -H "Accept: application/json, */*" 'https://kubernetes-westus2-28471.westus2.cloudapp.azure.com/apis/metrics.k8s.io/v1beta1/nodes'
I1201 13:13:05.734222   91354 round_trippers.go:405] GET https://kubernetes-westus2-28471.westus2.cloudapp.azure.com/apis/metrics.k8s.io/v1beta1/nodes 200 OK in 22 milliseconds
I1201 13:13:05.734239   91354 round_trippers.go:411] Response Headers:
I1201 13:13:05.734245   91354 round_trippers.go:414]     Audit-Id: daad721c-0a51-4a8f-a096-885936248f70
I1201 13:13:05.734250   91354 round_trippers.go:414]     Content-Type: application/json
I1201 13:13:05.734255   91354 round_trippers.go:414]     Date: Sat, 01 Dec 2018 21:13:05 GMT
I1201 13:13:05.734260   91354 round_trippers.go:414]     Content-Length: 137
I1201 13:13:05.734520   91354 request.go:942] Response Body: {"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/metrics.k8s.io/v1beta1/nodes"},"items":[]}
F1201 13:13:05.739608   91354 helpers.go:119] error: metrics not available yet

@CecileRobertMichon
Copy link
Contributor

update: works with Kubenet

@jackfrancis jackfrancis changed the title HPA is broken in 1.10.10, 1.10.11, 1.12.3 HPA is broken in 1.10.10, 1.10.11, 1.12.3, 1.13.0 Dec 3, 2018
@jackfrancis jackfrancis changed the title HPA is broken in 1.10.10, 1.10.11, 1.12.3, 1.13.0 metrics-server is broken in 1.10.10, 1.10.11, 1.12.3, 1.13.0 Dec 4, 2018
@abhishekunotech
Copy link

@CecileRobertMichon . I am running a v1.13.0 , 1 master, 2 worker cluster with Flannel. Anything I can contribute towards testing on this issue?

@CecileRobertMichon
Copy link
Contributor

Thank you @abhishekunotech, we found the root cause. It turned out to be a newly introduced bug upstream that broke Azure CNI. Here's the PR to fix it:

kubernetes/kubernetes#71736

@mboersma
Copy link
Member

At this point aks-engine is only waiting for Kubernetes 1.10.12 to see that this has been patched in all affected release series.

@mathieu-benoit
Copy link
Contributor

mathieu-benoit commented Dec 19, 2018

FYI @mboersma It looks like k8s 1.10.12 has just been released ;)
https://github.com/kubernetes/kubernetes/releases/tag/v1.10.12

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants