feat: de-couple --log.prober to be an independent logger#1461
Conversation
This is the beginning of the changes in architecture/behavior for the scrape probe logger as described here: prometheus#1377 (comment) Currently, the scrape probe logger is a weird transforming-wrapper around the primary logger, that comes with unconventional behavior -- it hijacks log calls and forces them to a pre-set level. This both confusing to users (see prometheus#1377, prometheus#1411, prometheus#1401, potentially others), as well as very inflexible; by forcing logs to a specific level, we lose the ability to do more nuanced logging like using debug to increase verbosity or writing errors at error level instead of $configuredLevel. This commit converts the scrape probe logger to be it's own full/independent logger via promslog, same as the primary logger. Rather than wrapping and munging the log entries, we use the primary promslog.Config as the basis of the config for the new scrape probe logger. We override the level field to reflect the level var that corresponds to the `--log.prober` flag, and then use that config to create the scrape probe logger. **NOTES**: - This intentionally leaves the `--log.prober` flag default to `debug` for now -- most probers write logs at `info` level, which would result in more log spamming for users. In order for the blackbox exporter to truly take advantage of these changes, the probers will need to be updated to correctly use leveled logging. - Since we now use v0.65.0 for prometheus/common, I removed the `getSlogLevel` helper func in favor of getting the level directly from the config. Example debug output for a scrape: ``` ~/go/src/github.com/prometheus/prometheus (main [ ]) -> curl -sL "http://localhost:9115/probe?target=prometheus.io&module=http_2xx&debug=true" | head -20 Logs for the probe: time=2025-09-04T12:07:05.573-04:00 level=INFO source=handler.go:116 msg="Beginning probe" module=http_2xx target=prometheus.io probe=http timeout_seconds=119.5 time=2025-09-04T12:07:05.574-04:00 level=INFO source=utils.go:61 msg="Resolving target address" module=http_2xx target=prometheus.io target=prometheus.io ip_protocol=ip4 time=2025-09-04T12:07:05.574-04:00 level=INFO source=utils.go:96 msg="Resolved target address" module=http_2xx target=prometheus.io target=prometheus.io ip=172.67.201.240 time=2025-09-04T12:07:05.574-04:00 level=INFO source=http.go:209 msg="Making HTTP request" module=http_2xx target=prometheus.io url=http://172.67.201.240 host=prometheus.io time=2025-09-04T12:07:05.787-04:00 level=INFO source=http.go:490 msg="Received redirect" module=http_2xx target=prometheus.io location=https://prometheus.io/ time=2025-09-04T12:07:05.787-04:00 level=INFO source=http.go:209 msg="Making HTTP request" module=http_2xx target=prometheus.io url=https://prometheus.io/ host="" time=2025-09-04T12:07:05.787-04:00 level=INFO source=http.go:224 msg="Address does not match first address, not sending TLS ServerName" module=http_2xx target=prometheus.io first=172.67.201.240 address=prometheus.io time=2025-09-04T12:07:06.216-04:00 level=INFO source=http.go:590 msg="Received HTTP response" module=http_2xx target=prometheus.io status_code=200 time=2025-09-04T12:07:06.507-04:00 level=INFO source=http.go:721 msg="Response timings for roundtrip" module=http_2xx target=prometheus.io roundtrip=0 start=2025-09-04T12:07:05.574-04:00 dnsDone=2025-09-04T12:07:05.574-04:00 connectDone=2025-09-04T12:07:05.679-04:00 gotConn=2025-09-04T12:07:05.679-04:00 responseStart=2025-09-04T12:07:05.787-04:00 tlsStart=0001-01-01T00:00:00.000Z tlsDone=0001-01-01T00:00:00.000Z end=0001-01-01T00:00:00.000Z time=2025-09-04T12:07:06.507-04:00 level=INFO source=http.go:721 msg="Response timings for roundtrip" module=http_2xx target=prometheus.io roundtrip=1 start=2025-09-04T12:07:05.787-04:00 dnsDone=2025-09-04T12:07:05.788-04:00 connectDone=2025-09-04T12:07:05.912-04:00 gotConn=2025-09-04T12:07:06.043-04:00 responseStart=2025-09-04T12:07:06.215-04:00 tlsStart=2025-09-04T12:07:05.912-04:00 tlsDone=2025-09-04T12:07:06.043-04:00 end=2025-09-04T12:07:06.507-04:00 time=2025-09-04T12:07:06.507-04:00 level=INFO source=handler.go:127 msg="Probe succeeded" module=http_2xx target=prometheus.io duration_seconds=0.933592996 Metrics that would have been returned: probe_dns_lookup_time_seconds 0.000605565 ``` Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
|
cc: @electron0zero for when you have time -- I still plan to follow up on this PR with another commit to audit the probers and improve the log leveling so they're not writing at info level in-code |
Not gonna go into in-depth tests of the logger here, this is already sufficient to validate both that leveling works and that logs no longer all get coerced to the same level as they were previously. Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Now that the scrape probe logger is an independent logger and can do proper leveling, this does a bunch of updates to the various prober types to improve the actual leveling of the probe logs. General patterns/concepts applied: - if it's within an error check, log at error level by default. - if the error is non-fatal, appropriate for a warning level message, and similar bits of code also use warning level instead of error, log at warning level - if the log is verbose/granular, information dense, and targeted at primarily at us as developers for troubleshooting the flow of a probe, then it's logged at debug level. Since the probe logger is an independent logger now, we can be stricter about what events qualify as general info level events useful to both operators and developers. Example of an `http_2xx` probe against `prometheus.io` with `--log.prober=info`: ``` ~/go/src/github.com/prometheus/prometheus (main [ ]) -> curl -sL "http://localhost:9115/probe?target=prometheus.io&module=http_2xx&debug=true" | sed '/Metrics that would have been returned/q' Logs for the probe: time=2025-09-08T18:22:23.217-04:00 level=WARN source=http.go:490 msg="Received redirect" module=http_2xx target=prometheus.io location=https://prometheus.io/ Metrics that would have been returned: ``` And here is an example of an `http_2xx` probe against `prometheus.io` with `--log.prober=debug`: ``` ~/go/src/github.com/prometheus/prometheus (main [ ]) -> curl -sL "http://localhost:9115/probe?target=prometheus.io&module=http_2xx&debug=true" | sed '/Metrics that would have been returned/q' Logs for the probe: time=2025-09-08T18:22:40.401-04:00 level=DEBUG source=handler.go:116 msg="Beginning probe" module=http_2xx target=prometheus.io probe=http timeout_seconds=119.5 time=2025-09-08T18:22:40.401-04:00 level=DEBUG source=utils.go:61 msg="Resolving target address" module=http_2xx target=prometheus.io target=prometheus.io ip_protocol=ip4 time=2025-09-08T18:22:40.403-04:00 level=DEBUG source=utils.go:96 msg="Resolved target address" module=http_2xx target=prometheus.io target=prometheus.io ip=172.67.201.240 time=2025-09-08T18:22:40.403-04:00 level=DEBUG source=http.go:209 msg="Making HTTP request" module=http_2xx target=prometheus.io url=http://172.67.201.240 host=prometheus.io time=2025-09-08T18:22:40.558-04:00 level=WARN source=http.go:490 msg="Received redirect" module=http_2xx target=prometheus.io location=https://prometheus.io/ time=2025-09-08T18:22:40.558-04:00 level=DEBUG source=http.go:209 msg="Making HTTP request" module=http_2xx target=prometheus.io url=https://prometheus.io/ host="" time=2025-09-08T18:22:40.558-04:00 level=DEBUG source=http.go:224 msg="Address does not match first address, not sending TLS ServerName" module=http_2xx target=prometheus.io first=172.67.201.240 address=prometheus.io time=2025-09-08T18:22:40.916-04:00 level=DEBUG source=http.go:590 msg="Received HTTP response" module=http_2xx target=prometheus.io status_code=200 time=2025-09-08T18:22:41.135-04:00 level=DEBUG source=http.go:721 msg="Response timings for roundtrip" module=http_2xx target=prometheus.io roundtrip=0 start=2025-09-08T18:22:40.403-04:00 dnsDone=2025-09-08T18:22:40.403-04:00 connectDone=2025-09-08T18:22:40.479-04:00 gotConn=2025-09-08T18:22:40.479-04:00 responseStart=2025-09-08T18:22:40.558-04:00 tlsStart=0001-01-01T00:00:00.000Z tlsDone=0001-01-01T00:00:00.000Z end=0001-01-01T00:00:00.000Z time=2025-09-08T18:22:41.135-04:00 level=DEBUG source=http.go:721 msg="Response timings for roundtrip" module=http_2xx target=prometheus.io roundtrip=1 start=2025-09-08T18:22:40.558-04:00 dnsDone=2025-09-08T18:22:40.559-04:00 connectDone=2025-09-08T18:22:40.629-04:00 gotConn=2025-09-08T18:22:40.740-04:00 responseStart=2025-09-08T18:22:40.916-04:00 tlsStart=2025-09-08T18:22:40.629-04:00 tlsDone=2025-09-08T18:22:40.740-04:00 end=2025-09-08T18:22:41.135-04:00 time=2025-09-08T18:22:41.136-04:00 level=DEBUG source=handler.go:127 msg="Probe succeeded" module=http_2xx target=prometheus.io duration_seconds=0.734697419 Metrics that would have been returned: ``` Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
|
@electron0zero alright, I think this is ready for a closer look when you have time. Feedback welcome, especially on the docs updates and the probe leveling changes. As a maintainer, I'll defer to you as to how to handle the release announcement and publishing the breakage/change in behavior. I do see there's a breaking changes section in the changelog template, but unsure if there's other things that should be done as well. |
@tjhop we just add the breaking change in the change log with the details of what's breaking and how users are impacted during the release, I don't see a breaking change in the past but see prometheus CHANGELOG.md for examples. In this case end users probably need to know two things:
you can add these two things in the PR desc with |
electron0zero
left a comment
There was a problem hiding this comment.
change looks good to me, thanks for taking this up 🙏🏼
|
I am going to wait 2-3 weeks before I merge, so other maintainers have time to review and give feedback. |
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
|
it's been two weeks, thanks @tjhop, and great work on this 👏🏼 🙏🏼 |
undo some of the log level changes made in prometheus#1461 fixes prometheus#1509
undo some of the log level changes made in prometheus#1461 fixes prometheus#1509 Signed-off-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>
|
Still unclear logging behavior |
|
please see https://github.com/prometheus/blackbox_exporter/tree/master?tab=readme-ov-file#controlling-log-level-for-probe-logs and the example logs to better understand the logging behavior |
|
I read this section before asking here. None of the flags allow me to achieve what I want: for the application log to only contain errors, while a curl request with the |
Debug logs for the probe/scrape, you mean? The distinction is important, now that the scrape probe logger is independent from the main application logger. If so, I think the only way to accomplish that would be to also have the scrape probe logger write to a different destination than the main application logger. As mentioned in the commit:
When creating the scrape probe logger config, we use the same @electron0zero @SuperQ would you have thoughts on adding a flag like |
|
Having a separate output flag seems reasonable. I'm going to lock this PR so we don't get necro-posting. New question/ issues should go to the community or issue tracking in the case of bugs and feature requests. |
This is the beginning of the changes in architecture/behavior for the
scrape probe logger as described here:
#1377 (comment)
Currently, the scrape probe logger is a weird transforming-wrapper
around the primary logger, that comes with unconventional behavior -- it
hijacks log calls and forces them to a pre-set level. This both
confusing to users (see #1377, #1411, #1401, potentially others), as
well as very inflexible; by forcing logs to a specific level, we lose
the ability to do more nuanced logging like using debug to increase
verbosity or writing errors at error level instead of $configuredLevel.
This commit converts the scrape probe logger to be it's own
full/independent logger via promslog, same as the primary logger. Rather
than wrapping and munging the log entries, we use the primary
promslog.Config as the basis of the config for the new scrape probe
logger. We override the level field to reflect the level var that
corresponds to the
--log.proberflag, and then use that config tocreate the scrape probe logger.
NOTES:
This intentionally leaves theEdit: This has been updated in later commits. The--log.proberflag default todebugfor now -- most probers write logs at
infolevel, which would resultin more log spamming for users. In order for the blackbox exporter to
truly take advantage of these changes, the probers will need to be
updated to correctly use leveled logging.
--log.proberflag now defaults toinfolike a normal logger, and the probers have been updated to better use leveled logging.getSlogLevelhelper func in favor of getting the level directly fromthe config.
Example debug output for a scrape:
Signed-off-by: TJ Hoplock t.hoplock@gmail.com