Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/iis] iis.request.queue.age.maxconsistently fails to scrape #14575

Closed
BinaryFissionGames opened this issue Sep 28, 2022 · 10 comments · Fixed by #23234
Closed

[receiver/iis] iis.request.queue.age.maxconsistently fails to scrape #14575

BinaryFissionGames opened this issue Sep 28, 2022 · 10 comments · Fixed by #23234
Assignees
Labels
bug Something isn't working priority:p2 Medium receiver/iis

Comments

@BinaryFissionGames
Copy link
Contributor

BinaryFissionGames commented Sep 28, 2022

What happened?

Description

iis.request.queue.age.maxconsistently fails to scrape. This MAY be because the counter that it scrapes, \HTTP Service Request Queues(*)\MaxQueueItemAge doesn't seem to be populated consistently (seems to be cleared if nothing is in the queue), and gives this error:

2022-09-28T17:47:57.094Z        warn    [email protected]/scraper.go:122      some performance counters could not be scraped;         {"kind": "receiver", "name": "iis", "pipeline": "metrics", "error": "A counter with a negative denominator value was detected.\r\n"}

This happens on every scrape, which ends up filling the logs with the same warning.

Even when generating load, I can't figure out how to get this metric to consistently report.

Steps to Reproduce

Set up a default site for IIS, with no load.
Attempt to scrape using the iis receiver.
See error message in logs.

Expected Result

No error message if this is expected behavior; Not sure if omitting the metric in this case is the correct behavior or not.

Actual Result

Error message is printed on every scrape

Collector version

v0.60.0

Environment information

Encountered on Windows server 2019, with the default IIS website.

OpenTelemetry Collector configuration

receivers:
  iis:
    collection_interval: 30s

processors:
  batch:

exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    metrics:
      receivers: [iis]
      processors: [batch]
      exporters: [logging]

Log output

2022-09-28T17:47:57.094Z        warn    [email protected]/scraper.go:122      some performance counters could not be scraped;         {"kind": "receiver", "name": "iis", "pipeline": "metrics", "error": "A counter with a negative denominator value was detected.\r\n"}

Additional context

No response

@BinaryFissionGames BinaryFissionGames added bug Something isn't working needs triage New item requiring triage labels Sep 28, 2022
@evan-bradley evan-bradley added priority:p2 Medium receiver/iis and removed needs triage New item requiring triage labels Sep 29, 2022
@github-actions
Copy link
Contributor

Pinging code owners: @Mrod1598 @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

I recently looked into this same error and concluded that it was due to the counter rolling over. (See https://support.microfocus.com/kb/doc.php?id=7010545)

I opened #14343 with what I think may fix a flaky integration test that occasionally showed this error. That said, if it is happening every time, I'm not sure what to make of that. We should probably remove the metric unless someone can get to the bottom of it.

@BinaryFissionGames
Copy link
Contributor Author

BinaryFissionGames commented Sep 29, 2022

Yeah, it's happening on every scrape, so that's not the problem here. This counter is of type ELAPSED_TIME (see: https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc756820(v=ws.10))

The denominator being negative, in this case, seems to be just the normal behavior when nothing is in the web server http queue. I think we just need to ignore this specific error for this specific metric.

I'd like to know what counter is failing for that other case; I think knowing the type of the counter might help to figure out what's going on. I don't know if counter rollover makes sense there; They both are measuring time since startup, it'd seem odd that the counters would roll over in the short time that the tests run.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 29, 2022
@Doron-Bargo
Copy link
Contributor

Hi, we are still see this error . Any plan to fix it?

@djaglowski djaglowski removed the Stale label Apr 13, 2023
@BinaryFissionGames
Copy link
Contributor Author

@djaglowski I can take a look into this if you like;

We can either

  • Emit 0 for this metric when we get the "A counter with a negative denominator value was detected.\r\n" error, and skip logging the error.
  • Remove this metric from the receiver.

I think the first option is the correct behavior, since nothing is in the queue when we get this error.

@djaglowski
Copy link
Member

@BinaryFissionGames, that sounds good to me. Thanks.

@BinaryFissionGames
Copy link
Contributor Author

Just want to update that this is still on my radar; I was looking into this last week and found it was a little more complicated than I thought it would be due to how the data is retrieved from the Windows API.

@djaglowski
Copy link
Member

@BinaryFissionGames, any update on this?

@BinaryFissionGames
Copy link
Contributor Author

@djaglowski I'll put a PR out for this tonight and we can iterate on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Medium receiver/iis
Projects
None yet
4 participants