Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric - Error in collected metric http_server_duration #737

Closed
julianocosta89 opened this issue Feb 9, 2023 · 9 comments · Fixed by #738 or open-telemetry/opentelemetry-js#3710
Closed
Labels
bug Something isn't working

Comments

@julianocosta89
Copy link
Member

Bug Report

Which version of the demo you are using?
6af2bdc.

Symptom

A clear and concise description of what the bug is.

What is the expected behavior?
I expect the sample to run without any errors.

What is the actual behavior?

After starting the sample, I started getting the following error on the OTel collector logs:

otel-col                 | * collected metric http_server_duration label:<name:"container_id" value:"354b4b881bf8956b0483f2231f348cd99f4a56e087617808e1ff88ac435400d2" > label:<name:"http_flavor" value:"HTTP/2" > label:<name:"http_host" value:"cartservice:7070" > label:<name:"http_method" value:"POST" > label:<name:"http_scheme" value:"http" > label:<name:"http_status_code" value:"200" > label:<name:"http_target" value:"/hipstershop.CartService/GetCart" > label:<name:"job" value:"opentelemetry-demo/cartservice" > label:<name:"service_name" value:"cartservice" > label:<name:"service_namespace" value:"opentelemetry-demo" > label:<name:"telemetry_sdk_language" value:"dotnet" > label:<name:"telemetry_sdk_name" value:"opentelemetry" > label:<name:"telemetry_sdk_version" value:"1.4.0.665" > histogram:<sample_count:615 sample_sum:689.5118000000004 bucket:<cumulative_count:0 upper_bound:0 > bucket:<cumulative_count:613 upper_bound:5 > bucket:<cumulative_count:614 upper_bound:10 > bucket:<cumulative_count:614 upper_bound:25 > bucket:<cumulative_count:614 upper_bound:50 > bucket:<cumulative_count:614 upper_bound:75 > bucket:<cumulative_count:614 upper_bound:100 > bucket:<cumulative_count:615 upper_bound:250 > bucket:<cumulative_count:615 upper_bound:500 > bucket:<cumulative_count:615 upper_bound:750 > bucket:<cumulative_count:615 upper_bound:1000 > bucket:<cumulative_count:615 upper_bound:2500 > bucket:<cumulative_count:615 upper_bound:5000 > bucket:<cumulative_count:615 upper_bound:7500 > bucket:<cumulative_count:615 upper_bound:10000 > >  has help "measures the duration of the inbound HTTP request" but should have "measures the duration of the inbound HTTP requests"
otel-col                 |      {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
otel-col                 | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println
otel-col                 |      github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/log.go:34
otel-col                 | github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1
otel-col                 |      github.com/prometheus/[email protected]/prometheus/promhttp/http.go:139
otel-col                 | net/http.HandlerFunc.ServeHTTP
otel-col                 |      net/http/server.go:2109
otel-col                 | net/http.(*ServeMux).ServeHTTP
otel-col                 |      net/http/server.go:2487
otel-col                 | go.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1
otel-col                 |      go.opentelemetry.io/[email protected]/config/confighttp/compression.go:162
otel-col                 | net/http.HandlerFunc.ServeHTTP
otel-col                 |      net/http/server.go:2109
otel-col                 | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP
otel-col                 |      go.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:210
otel-col                 | go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP
otel-col                 |      go.opentelemetry.io/[email protected]/config/confighttp/clientinfohandler.go:39
otel-col                 | net/http.serverHandler.ServeHTTP
otel-col                 |      net/http/server.go:2947
otel-col                 | net/http.(*conn).serve
otel-col                 |      net/http/server.go:1991

Reproduce

Clone the repo and simply run:

docker compose up

We will close this issue if:

  • The steps you provided are complex.
  • If we can not reproduce the behavior you're reporting.

Additional Context

The app runs normally, but from the logs it seems that some metrics are not being sent to Prometheus.

@julianocosta89 julianocosta89 added the bug Something isn't working label Feb 9, 2023
@austinlparker
Copy link
Member

Hm, this seems like a mismatch between metrics somehow. Do we have the same metric in two places with mismatched help?

@austinlparker
Copy link
Member

Interesting, that metric seems to have changed. At head...

# TYPE http_server_duration histogram
http_server_duration_bucket{http_flavor="HTTP/2",http_host="cartservice:7070",http_method="POST",http_scheme="http",http_status_code="200",http_target="/hipstershop.CartService/AddItem",job="opentelemetry-demo/cartservice",service_name="cartservice",service_namespace="opentelemetry-demo",telemetry_sdk_language="dotnet",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.4.0.665",le="0"} 0

1.3.0

# HELP http_server_duration measures the duration of the inbound HTTP requests
# TYPE http_server_duration histogram
http_server_duration_bucket{container_id="8d004e7bce6343dfc65a4f5d7a562750588b7966e8ed4b05275e03aaac1c760d",host_arch="arm64",host_name="8d004e7bce63",http_flavor="1.1",http_method="GET",http_scheme="http",http_status_code="200",job="opentelemetry-demo/frontend",net_host_name="frontend",net_host_port="8080",os_type="linux",os_version="5.15.49-linuxkit",process_command="/app/server.js",process_command_line="/usr/local/bin/node /app/server.js",process_executable_name="node",process_pid="17",process_runtime_description="Node.js",process_runtime_name="nodejs",process_runtime_version="18.14.0",service_name="frontend",service_namespace="opentelemetry-demo",le="0"} 0

However, I think what's happening is that these metrics are in head and 1.3, but what's happening is that they're racing each other. The problem is, the #HELP text differs between them (which I believe is the OTLP metric description), which causes the prometheus exporter to not merge them correctly, but whichever one gets scraped + reported first is the one that 'wins' and prometheus won't overwrite what's already there?

@austinlparker
Copy link
Member

Keeping this open as we'll need to take the release from .NET which fixes it upstream.

@puckpuck
Copy link
Contributor

puckpuck commented Mar 4, 2023

I tried taking the latest 1.4.0/1.0.0-rc9.14 from the .NET SDKs and still got the error. This PR should fix it. I will test it again after it's released.

@cartersocha
Copy link
Contributor

@puckpuck
Copy link
Contributor

puckpuck commented Apr 1, 2023

I worked with @martinjt on this ... and it turns out I was barking up the wrong tree and I misunderstood how .NET instrumentation packages were done 🤦

We still see this error because JS also had the same incorrect description that the .NET SDK had. I have this PR in the JS repo to address this.

log entry from collector

otel-col  | * collected metric http_server_duration label:<name:"container_id" value:"e7414d44db0e8b8029644b59a47717a325a6c420f62dc08b293dc30c4b750c71" > label:<name:"host_arch" value:"arm64" > label:<name:"host_name" value:"e7414d44db0e" > label:<name:"http_flavor" value:"1.1" > label:<name:"http_method" value:"GET" > label:<name:"http_scheme" value:"http" > label:<name:"http_status_code" value:"200" > label:<name:"job" value:"opentelemetry-demo/frontend" > label:<name:"net_host_name" value:"frontend" > label:<name:"net_host_port" value:"8080" > label:<name:"os_type" value:"linux" > label:<name:"os_version" value:"5.15.49-linuxkit" > label:<name:"process_command" value:"/app/server.js" > label:<name:"process_command_line" value:"/usr/local/bin/node /app/server.js" > label:<name:"process_executable_name" value:"node" > label:<name:"process_pid" value:"16" > label:<name:"process_runtime_description" value:"Node.js" > label:<name:"process_runtime_name" value:"nodejs" > label:<name:"process_runtime_version" value:"18.15.0" > label:<name:"service_name" value:"frontend" > label:<name:"service_namespace" value:"opentelemetry-demo" > label:<name:"telemetry_sdk_language" value:"nodejs" > label:<name:"telemetry_sdk_name" value:"opentelemetry" > label:<name:"telemetry_sdk_version" value:"1.10.1" > histogram:<sample_count:122 sample_sum:3952 bucket:<cumulative_count:0 upper_bound:0 > bucket:<cumulative_count:19 upper_bound:5 > bucket:<cumulative_count:46 upper_bound:10 > bucket:<cumulative_count:103 upper_bound:25 > bucket:<cumulative_count:115 upper_bound:50 > bucket:<cumulative_count:116 upper_bound:75 > bucket:<cumulative_count:116 upper_bound:100 > bucket:<cumulative_count:118 upper_bound:250 > bucket:<cumulative_count:118 upper_bound:500 > bucket:<cumulative_count:122 upper_bound:1000 > >  has help "measures the duration of the inbound HTTP requests" but should have "Measures the duration of inbound HTTP requests."

@austinlparker
Copy link
Member

The JS PR to fix this was merged (open-telemetry/opentelemetry-js#3710)

@julianocosta89
Copy link
Member Author

we just need a new release now 🤞🏽

@puckpuck
Copy link
Contributor

puckpuck commented Aug 4, 2023

I believe this is all resolved now, and we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants