Reduce Warn verbosity, switch to Info by default #1931

tsg · 2016-06-29T10:49:02Z

The current state is that we log at Error level by default. The historical reason is that Packetbeat could generate a lot of valid warnings (e.g. when traffic contains invalid patterns) which result in slowing it down for the rest of traffic, or filling up the disk, or rotating the logs too fast, etc.

But printing only errors by default is also not great, because:

the user gets no feedback that the Beat correctly started
in some cases, critical information is hidden. See lack of logging running as service on Ubuntu #1734 for an example
very low logging defaults means that for almost any troubleshooting we need to ask the user to increase verbosity

In general, we want to avoid warnings all together, as they are not friendly to the ops people. For us, warnings (and to some extent INFO messages) are essentially debug messages that we want to have always on. The plan is to replace Warnings with expvar metrics, which can be inspected in realtime via the -httpprof option and for which we can also write periodic summaries to the logs.

This is a meta ticket to:

The text was updated successfully, but these errors were encountered:

tsg · 2016-06-29T11:10:34Z

Compared to the other Beats, Filebeat is a bit more verbose at INFO level. For example, it prints a message on each bulk publish:

2016/06/29 10:50:49.536775 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:50.406032 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:50.407090 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:51.227240 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:51.228463 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:52.017300 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:52.018828 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:53.086974 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:53.096369 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:53.862119 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:53.864995 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:54.688896 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:54.690475 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:55.461237 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:55.463595 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:56.321635 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:56.322822 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:57.372060 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:57.373635 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:57.970777 prospector.go:128: INFO Run prospector
2016/06/29 10:50:58.285754 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:58.286825 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:59.011178 publish.go:111: INFO Events sent: 2048

Also it prints an Info on each prospector run, on each file open, etc.

@elastic/beats Do we generally want Beats to print stuff periodically by default to help with troubleshooting, or should we move stuff like the above to debug level? Note that a more aggresive scan_frequency setting (to give an example) will result in even more logs.

ruflin · 2016-06-29T11:51:46Z

There are definitively messages which should be moved to debug like Run Prospector. I somehow like the "Events sent" and "states written" in here as it shows progress, but in high load environment that could become too much. At the same time these could switch to WARN or ERR?

In general I think each beat should print some info messages at startup to say that all configs were read and startup is correct, and then during running send some status updates every few seconds so it can be seen that something happens. For the publisher, it could be interesting that every 10 seconds for example x events sent is logged (summary).

tsg · 2016-06-29T11:53:46Z

Yeah, we could replace filebeat specific "Events sent" with a libbeat debug statement that prints selected expvars. That would be quite nice.

Demoted warnings like "Two requests without a response" or "Response without a request" to the Debug level. To compensate, expvar counters are added for these. These types of warnings are normal in some situations (e.g. at startup) but can indicate serious issues in the other cases. This is part of elastic#1931, which has the overall goal for switching the default logging level to INFO.

Demoted warnings like "Two requests without a response" or "Response without a request" to the Debug level. To compensate, expvar counters are added for these. These types of warnings are normal in some situations (e.g. at startup) but can indicate serious issues in the other cases. This is part of #1931, which has the overall goal for switching the default logging level to INFO.

This is part of elastic#1931 and fixes elastic#1734. Some things to note/discuss: * `-v` does nothing by default now. However, it's possible that you set it to error in the config file and then use -v to go back to info level at the CLI. So I kept the option. * the `logging.level: debug` from the default configuration also doesn nothing now, unless you also specify some selectors. So I added also a `logging.selectors: ["*"]` to the default short config. * While not all of elastic#1931 is done yet, I think we can proceed with the default level change to fix elastic#1734 and then we do more INFO/WARN cleanups in future PRs.

This is part of #1931 and fixes #1734. Some things to note/discuss: * `-v` does nothing by default now. However, it's possible that you set it to error in the config file and then use -v to go back to info level at the CLI. So I kept the option. * the `logging.level: debug` from the default configuration also doesn nothing now, unless you also specify some selectors. So I added also a `logging.selectors: ["*"]` to the default short config. * While not all of #1931 is done yet, I think we can proceed with the default level change to fix #1734 and then we do more INFO/WARN cleanups in future PRs.

Prints the delta since the last period, if it's non-zero. The idea is to help with troubleshooting while still creating relatively little output in the logs. 30s seems like a good compromise. Part of/related to elastic#1931

Prints the delta since the last period, if it's non-zero. The idea is to help with troubleshooting while still creating relatively little output in the logs. 30s seems like a good compromise. Part of/related to #1931

Part of elastic#1931, this applies the metricbeat conventions to the names of the expvars. Most of the counters are still top level integers.

Part of elastic#1931, this adds logic to descend into maps. This is useful especially to support the metricbeat fetches stats. Locking should be safe as the expvar.Map Do function takes a read lock on the map. Note that expvars of type Func are still not included, which is why the memstats are not logged.

Part of #1931, this adds logic to descend into maps. This is useful especially to support the metricbeat fetches stats. Locking should be safe as the expvar.Map Do function takes a read lock on the map. Note that expvars of type Func are still not included, which is why the memstats are not logged.

Part of #1931, this applies the metricbeat conventions to the names of the expvars. Most of the counters are still top level integers.

Part of elastic#1931. Replaces two of the "periodic" Infos with Debugs + expvars.

Part of #1931. Replaces two of the "periodic" Infos with Debugs + expvars.

Transform a debug in Info for the loadbalancing mode, to make sure we have at least an Info on any connection failure. Part of elastic#1931.

Transform a debug in Info for the loadbalancing mode, to make sure we have at least an Info on any connection failure. Part of #1931.

Part of elastic#1931. The default log level was changed to INFO in PR elastic#1952, but the change wasn't reflected in the default configuration files. This fixes it. The issue was signaled in elastic#2183. Also fixes the logging level in the docs.

Part of #1931. The default log level was changed to INFO in PR #1952, but the change wasn't reflected in the default configuration files. This fixes it. The issue was signaled in #2183. Also fixes the logging level in the docs.

Replaces the similar functionality from Metricbeat and Winlogbeat, generalizing it to all Beats. Part of elastic#1931.

Replaces the similar functionality from Metricbeat and Winlogbeat, generalizing it to all Beats. Part of #1931.

Part of elastic#1931.

Part of #1931.

tsg · 2016-08-24T12:10:32Z

All done, now.

tsg added refactoring v5.0.0-beta1 labels Jun 29, 2016

tsg added the meta label Jul 1, 2016

tsg mentioned this issue Jul 1, 2016

Replace most Packetbeat warnings with expvar metrics #1948

Merged

tsg added the in progress Pull request is currently in progress. label Jul 1, 2016

tsg mentioned this issue Jul 4, 2016

Switch default level to INFO #1952

Merged

tsg mentioned this issue Jul 4, 2016

Log non-zero expvars periodically #1955

Merged

tsg pushed a commit to tsg/beats that referenced this issue Jul 5, 2016

Use dots for expvar names

35c630c

Part of elastic#1931, this applies the metricbeat conventions to the names of the expvars. Most of the counters are still top level integers.

tsg mentioned this issue Jul 5, 2016

Use dots for expvar names #1967

Merged

tsg mentioned this issue Jul 5, 2016

Snapshot expvar maps for periodic logging #1969

Merged

tsg mentioned this issue Jul 5, 2016

Reduce filebeat verbosity at INFO level #1970

Merged

ruflin pushed a commit that referenced this issue Jul 6, 2016

Use dots for expvar names (#1967)

114c428

Part of #1931, this applies the metricbeat conventions to the names of the expvars. Most of the counters are still top level integers.

tsg pushed a commit to tsg/beats that referenced this issue Jul 6, 2016

Reduce filebeat verbosity at INFO level

2bd61e0

Part of elastic#1931. Replaces two of the "periodic" Infos with Debugs + expvars.

tsg mentioned this issue Jul 6, 2016

Log connection attempts #1973

Merged

ruflin pushed a commit that referenced this issue Jul 6, 2016

Reduce filebeat verbosity at INFO level (#1970)

6460fce

Part of #1931. Replaces two of the "periodic" Infos with Debugs + expvars.

tsg pushed a commit to tsg/beats that referenced this issue Jul 7, 2016

Log connection attempts

af59be8

Transform a debug in Info for the loadbalancing mode, to make sure we have at least an Info on any connection failure. Part of elastic#1931.

urso pushed a commit that referenced this issue Jul 7, 2016

Log connection attempts (#1973)

3972206

Transform a debug in Info for the loadbalancing mode, to make sure we have at least an Info on any connection failure. Part of #1931.

tsg mentioned this issue Aug 9, 2016

Log event publishing errors as Err (was Info) #2183

Merged

tsg mentioned this issue Aug 9, 2016

Correct the default log level in configs #2199

Merged

tsg removed the v5.0.0-beta1 label Aug 22, 2016

tsg mentioned this issue Aug 23, 2016

Log total non-zero expvars on shutdown #2349

Merged

tsg pushed a commit to tsg/beats that referenced this issue Aug 23, 2016

Log total non-zero expvars on shutdown

808f09e

Replaces the similar functionality from Metricbeat and Winlogbeat, generalizing it to all Beats. Part of elastic#1931.

ruflin pushed a commit that referenced this issue Aug 23, 2016

Log total non-zero expvars on shutdown (#2349)

fec9ed1

Replaces the similar functionality from Metricbeat and Winlogbeat, generalizing it to all Beats. Part of #1931.

tsg pushed a commit to tsg/beats that referenced this issue Aug 24, 2016

Set info as log level in the full configs

f9c2527

Part of elastic#1931.

tsg mentioned this issue Aug 24, 2016

Set info as log level in the full configs #2364

Merged

ruflin pushed a commit that referenced this issue Aug 24, 2016

Set info as log level in the full configs (#2364)

14b70af

Part of #1931.

tsg closed this as completed Aug 24, 2016

tsg removed the in progress Pull request is currently in progress. label Aug 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce Warn verbosity, switch to Info by default #1931

Reduce Warn verbosity, switch to Info by default #1931

tsg commented Jun 29, 2016 •

edited

Loading

tsg commented Jun 29, 2016

ruflin commented Jun 29, 2016

tsg commented Jun 29, 2016 •

edited

Loading

tsg commented Aug 24, 2016

Reduce Warn verbosity, switch to Info by default #1931

Reduce Warn verbosity, switch to Info by default #1931

Comments

tsg commented Jun 29, 2016 • edited Loading

tsg commented Jun 29, 2016

ruflin commented Jun 29, 2016

tsg commented Jun 29, 2016 • edited Loading

tsg commented Aug 24, 2016

tsg commented Jun 29, 2016 •

edited

Loading

tsg commented Jun 29, 2016 •

edited

Loading