Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeletstats receiver: "Get \"https://NODE_NAME:10250/stats/summary\": dial tcp: lookup NODE_NAME on 10.245.0.10:53: no such host" #22843

Closed
devurandom opened this issue May 27, 2023 · 11 comments · Fixed by SigNoz/charts#380
Labels
question Further information is requested receiver/kubeletstats

Comments

@devurandom
Copy link

Component(s)

receiver/kubeletstats

What happened?

Description

I set up OpenTelemetry Collector as a agent on each node using the Helm Chart and the values.yaml template below. The kubeletstatsreceiver tries to resolve the Kubernetes node name that the Helm Chart injects as its endpoint via the cluster's DNS, but fails (see logs below).

This seems similar to how the Kubernetes metrics-server tried to do the same and then was changed to resolve the node via the Kubernetes API and the InternalIP field of the node status.

Steps to Reproduce

See values.yaml file below. I deployed it to a DigitalOcean Kubernetes cluster.

Expected Result

kubeletstatsreceiver should "just work".

Actual Result

See the error message and stack trace below.

Collector version

0.77.0

Environment information

Environment

OS: DigitalOcean Kubernetes

OpenTelemetry Collector configuration

# Helm values.yaml template
# ${redeploy_trigger}
fullnameOverride: "collector"
mode: "daemonset"
image:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/22815
  tag: "0.77.0"
service:
  enabled: true
presets:
  logsCollection:
    enabled: true
  hostMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
  kubeletMetrics:
    enabled: true
config:
  extensions:
    basicauth/otlp:
      client_auth:
        username: "$${env:INSTANCE_ID}"
        password: "$${env:PUBLISHER_TOKEN}"
  processors:
    # https://community.grafana.com/t/error-sending-logs-with-loki-resource-labels/87561
    resource:
      attributes:
        - action: insert
          key: service_name
          from_attribute: service.name
        - action: insert
          key: service_namespace
          from_attribute: service.namespace
        - action: insert
          key: service_version
          from_attribute: service.version
        - action: insert
          key: deployment_environment
          from_attribute: deployment.environment
        - action: insert
          key: loki.resource.labels
          value: service_name,service_namespace,service_version,deployment_environment
  receivers:
    jaeger: null
    zipkin: null
  exporters:
    otlphttp:
      auth:
        authenticator: basicauth/otlp
      endpoint: "$${env:EXPORT_URL}"
  service:
    extensions:
      - health_check
      - memory_ballast
      - basicauth/otlp
    pipelines:
      metrics:
        receivers:
          - otlp
          - prometheus
        exporters:
          - otlphttp
      traces:
        receivers:
          - otlp
        exporters:
          - otlphttp
      logs:
        receivers:
          - otlp
        processors:
          - resource
          - memory_limiter
          - batch
        exporters:
          - otlphttp
extraEnvs:
  - name: INSTANCE_ID
    valueFrom:
      secretKeyRef:
        name: "${grafana_secret}"
        key: instance_id
  - name: PUBLISHER_TOKEN
    valueFrom:
      secretKeyRef:
        name: "${grafana_secret}"
        key: publisher_token
  - name: EXPORT_URL
    valueFrom:
      secretKeyRef:
        name: "${grafana_secret}"
        key: export_url

Log output

2023-05-27T19:12:22.078Z	error	[email protected]/scraper.go:79	call to /stats/summary endpoint failed	{"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://NODE_NAME:10250/stats/summary\": dial tcp: lookup NODE_NAME on 10.245.0.10:53: no such host"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver.(*kubletScraper).scrape
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/scraper.go:79
go.opentelemetry.io/collector/receiver/scraperhelper.ScrapeFunc.Scrape
	go.opentelemetry.io/collector/[email protected]/scraperhelper/scraper.go:31
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
	go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:209
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
	go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:191
2023-05-27T19:12:22.078Z	error	scraperhelper/scrapercontroller.go:212	Error scraping metrics	{"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://NODE_NAME:10250/stats/summary\": dial tcp: lookup NODE_NAME on 10.245.0.10:53: no such host", "scraper": "kubeletstats"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
	go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:212
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
	go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:191

Additional context

Is there a workaround, e.g. relying on metrics-server or kube-state-metrics? How would I configure that?

@devurandom devurandom added bug Something isn't working needs triage New item requiring triage labels May 27, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • receiver/kubeletstats: @dmitryax
  • needs: Github issue template generation code needs this to generate the corresponding labels.

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 27, 2023
@devurandom
Copy link
Author

@open-telemetry/collector-contrib-triagers

@crobert-1
Copy link
Member

Apologies for the delay here. Is this the full and correct config file? I'm confused as to why you're getting kubeletstats receiver errors when there's no kubeletstats receiver defined, or in your pipeline shown.

@devurandom
Copy link
Author

devurandom commented Oct 1, 2023

Apologies for the delay here. Is this the full and correct config file? I'm confused as to why you're getting kubeletstats receiver errors when there's no kubeletstats receiver defined, or in your pipeline shown.

Thank you for your response!

The Helm Chart enables the kubeletstats receiver under the kubeletMetrics flag:

{{- if .Values.presets.kubeletMetrics.enabled }}
{{- $config = (include "opentelemetry-collector.applyKubeletMetricsConfig" (dict "Values" $data "config" $config) | fromYaml) }}
{{- end }}
[...]
{{- tpl (toYaml $config) . }}

(from https://github.com/open-telemetry/opentelemetry-helm-charts/blob/3471a2afe2d5e01d23b4bc02f62ef70077c7dcc7/charts/opentelemetry-collector/templates/_config.tpl#L43-L45 and https://github.com/open-telemetry/opentelemetry-helm-charts/blob/3471a2afe2d5e01d23b4bc02f62ef70077c7dcc7/charts/opentelemetry-collector/templates/_config.tpl#L52, the version of the Helm Chart I tried might have been different)

applyKubeletMetricsConfig is:

{{- define "opentelemetry-collector.applyKubeletMetricsConfig" -}}
{{- $config := mustMergeOverwrite (include "opentelemetry-collector.kubeletMetricsConfig" .Values | fromYaml) .config }}
{{- $_ := set $config.service.pipelines.metrics "receivers" (append $config.service.pipelines.metrics.receivers "kubeletstats" | uniq)  }}
{{- $config | toYaml }}
{{- end }}

{{- define "opentelemetry-collector.kubeletMetricsConfig" -}}
receivers:
  kubeletstats:
    collection_interval: 20s
    auth_type: "serviceAccount"
    endpoint: "${env:K8S_NODE_NAME}:10250"
{{- end }}

(from https://github.com/open-telemetry/opentelemetry-helm-charts/blob/3471a2afe2d5e01d23b4bc02f62ef70077c7dcc7/charts/opentelemetry-collector/templates/_config.tpl#L151-L163, the version of the Helm Chart I tried might have been different)

@paulohmorais
Copy link

Same problem here

Digital Oean K8s

The IP from the error is kube-dns svc:

kube-system kube-dns ClusterIP 10.245.0.10 dns:53►0╱UDP dns-tcp:53►0 metrics:9153►0

Errors logs:

│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x 2023-09-06T22:25:02.668Z    error    scraperhelper/scrapercontroller.go:213    Error scraping metrics    {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://w-pool-uy4qaitgx-y74ag:10250/stats/summary\": dial tcp: lookup w-pool-uy4qaitgx-y74ag on 10.245.0.10:53: no such host", "scraper": "kubeletstats"}                                                      │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport                                                                                                                                                                                                                                                                                                                                         │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x     go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:213                                                                                                                                                                                                                                                                                                                                         │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1                                                                                                                                                                                                                                                                                                                                            │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x     go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:192                                                                                                                                                                                                                                                                                                                                         │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x 2023-09-06T22:25:32.665Z    error    [email protected]/scraper.go:68    call to /stats/summary endpoint failed    {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://w-pool-uy4qaitgx-y74ag:10250/stats/summary\": dial tcp: lookup w-pool-uy4qaitgx-y74ag on 10.245.0.10:53: no such host"}                                                             │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver.(*kubletScraper).scrape                                                                                                                                                                                                                                                                                                                   │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x     github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/scraper.go:68                                                                                                                                                                                                                                                                                                                 │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x go.opentelemetry.io/collector/receiver/scraperhelper.ScrapeFunc.Scrape                                                                                                                                                                                                                                                                                                                                                            │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x     go.opentelemetry.io/collector/[email protected]/scraperhelper/scraper.go:20                                                                                                                                                                                                                                                                                                                                                    │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport                                                                                                                                                                                                                                                                                                                                         │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x     go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:210                                                                                                                                                                                                                                                                                                                                         │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1                                                                                                                                                                                                                                                                                                                                            │
│ signoz-k8s-metrics-exporter-k8s-infra-otel-agent-dnl9x     go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:192     

@jinja2
Copy link
Contributor

jinja2 commented Oct 12, 2023

Usually, cloud provider managed compute services have a private DNS setup for vms, and then the node's local domain is appended to the /etc/resolv.conf passed to k8s pods by the kubelet which makes the k8s node name (when same as the vm's hostname) resolvable. I am not familiar with Digital Ocean, but maybe they don't have a similar setup or the instances are not assigned private dnses. Either ways, you can use the node IP instead which can be passed as env var with a snippet like below

env:
- name: K8S_NODE_IP
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP

and then in the kubeletstats receiver

receivers:
  kubeletstats:
    endpoint: "${env:K8S_NODE_IP}:10250"

I would suggest opening up an issue in the chart repo to add using the node ip as an option and closing this one. This isn't something to fix in the receiver, instead it can be resolved with changes to the config passed to it.

@crobert-1 crobert-1 added question Further information is requested and removed needs triage New item requiring triage bug Something isn't working labels Oct 12, 2023
@crobert-1
Copy link
Member

@devurandom Let us know if any other information would be helpful! Otherwise, feel free to close the issue and open another one in the chart repo as @jinja2 suggested.

@crobert-1
Copy link
Member

I'm going to close this issue for now, but please feel free to let us know if you have any other questions.

@crobert-1 crobert-1 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 30, 2023
zimbatm added a commit to zimbatm/charts-1 that referenced this issue Jan 14, 2024
Not all k8s deployments resolve the node host name to an IP. This is
generally true inside of cloud providers, but not necessarily the case
when self-hosting.

Fixes open-telemetry/opentelemetry-collector-contrib#22843
prashant-shahi pushed a commit to SigNoz/charts that referenced this issue Jan 19, 2024
Not all k8s deployments resolve the node host name to an IP. This is
generally true inside of cloud providers, but not necessarily the case
when self-hosting.

Fixes open-telemetry/opentelemetry-collector-contrib#22843
@alextricity25
Copy link

Usually, cloud provider managed compute services have a private DNS setup for vms, and then the node's local domain is appended to the /etc/resolv.conf passed to k8s pods by the kubelet which makes the k8s node name (when same as the vm's hostname) resolvable. I am not familiar with Digital Ocean, but maybe they don't have a similar setup or the instances are not assigned private dnses. Either ways, you can use the node IP instead which can be passed as env var with a snippet like below

env:
- name: K8S_NODE_IP
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP

and then in the kubeletstats receiver

receivers:
  kubeletstats:
    endpoint: "${env:K8S_NODE_IP}:10250"

I would suggest opening up an issue in the chart repo to add using the node ip as an option and closing this one. This isn't something to fix in the receiver, instead it can be resolved with changes to the config passed to it.

For those interested, I opened up a PR in the charts repo here:
open-telemetry/opentelemetry-helm-charts#1206

@momenthana
Copy link

The above configuration was effective for me in addressing issues I was experiencing on DigitalOcean.

Since I was too busy to wait for a new chart version to be deployed, I added the following environment variable configuration to the opentelemetry-collector-daemonset Helm chart and used it with the existing chart version:

extraEnvs:
  - name: K8S_NODE_IP
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP

and then in the kubeletstats receiver

receivers:
  kubeletstats:
    endpoint: "${env:K8S_NODE_IP}:10250"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested receiver/kubeletstats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants