Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Warn verbosity, switch to Info by default #1931

Closed
11 tasks done
tsg opened this issue Jun 29, 2016 · 4 comments
Closed
11 tasks done

Reduce Warn verbosity, switch to Info by default #1931

tsg opened this issue Jun 29, 2016 · 4 comments

Comments

@tsg
Copy link
Contributor

tsg commented Jun 29, 2016

The current state is that we log at Error level by default. The historical reason is that Packetbeat could generate a lot of valid warnings (e.g. when traffic contains invalid patterns) which result in slowing it down for the rest of traffic, or filling up the disk, or rotating the logs too fast, etc.

But printing only errors by default is also not great, because:

  • the user gets no feedback that the Beat correctly started
  • in some cases, critical information is hidden. See lack of logging running as service on Ubuntu #1734 for an example
  • very low logging defaults means that for almost any troubleshooting we need to ask the user to increase verbosity

In general, we want to avoid warnings all together, as they are not friendly to the ops people. For us, warnings (and to some extent INFO messages) are essentially debug messages that we want to have always on. The plan is to replace Warnings with expvar metrics, which can be inspected in realtime via the -httpprof option and for which we can also write periodic summaries to the logs.

This is a meta ticket to:

@tsg
Copy link
Contributor Author

tsg commented Jun 29, 2016

Compared to the other Beats, Filebeat is a bit more verbose at INFO level. For example, it prints a message on each bulk publish:

2016/06/29 10:50:49.536775 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:50.406032 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:50.407090 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:51.227240 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:51.228463 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:52.017300 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:52.018828 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:53.086974 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:53.096369 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:53.862119 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:53.864995 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:54.688896 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:54.690475 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:55.461237 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:55.463595 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:56.321635 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:56.322822 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:57.372060 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:57.373635 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:57.970777 prospector.go:128: INFO Run prospector
2016/06/29 10:50:58.285754 publish.go:111: INFO Events sent: 2048
2016/06/29 10:50:58.286825 registrar.go:239: INFO Registry file updated. 12 states written.
2016/06/29 10:50:59.011178 publish.go:111: INFO Events sent: 2048

Also it prints an Info on each prospector run, on each file open, etc.

@elastic/beats Do we generally want Beats to print stuff periodically by default to help with troubleshooting, or should we move stuff like the above to debug level? Note that a more aggresive scan_frequency setting (to give an example) will result in even more logs.

@ruflin
Copy link
Member

ruflin commented Jun 29, 2016

There are definitively messages which should be moved to debug like Run Prospector. I somehow like the "Events sent" and "states written" in here as it shows progress, but in high load environment that could become too much. At the same time these could switch to WARN or ERR?

In general I think each beat should print some info messages at startup to say that all configs were read and startup is correct, and then during running send some status updates every few seconds so it can be seen that something happens. For the publisher, it could be interesting that every 10 seconds for example x events sent is logged (summary).

@tsg
Copy link
Contributor Author

tsg commented Jun 29, 2016

Yeah, we could replace filebeat specific "Events sent" with a libbeat debug statement that prints selected expvars. That would be quite nice.

@tsg tsg added the meta label Jul 1, 2016
tsg pushed a commit to tsg/beats that referenced this issue Jul 1, 2016
Demoted warnings like "Two requests without a response" or
"Response without a request" to the Debug level. To compensate,
expvar counters are added for these.

These types of warnings are normal in some situations (e.g. at startup)
but can indicate serious issues in the other cases.

This is part of elastic#1931, which has the overall goal for switching the
default logging level to INFO.
@tsg tsg added the in progress Pull request is currently in progress. label Jul 1, 2016
ruflin pushed a commit that referenced this issue Jul 4, 2016
Demoted warnings like "Two requests without a response" or
"Response without a request" to the Debug level. To compensate,
expvar counters are added for these.

These types of warnings are normal in some situations (e.g. at startup)
but can indicate serious issues in the other cases.

This is part of #1931, which has the overall goal for switching the
default logging level to INFO.
tsg pushed a commit to tsg/beats that referenced this issue Jul 4, 2016
This is part of elastic#1931 and fixes elastic#1734.

Some things to note/discuss:

* `-v` does nothing by default now. However, it's possible that you
set it to error in the config file and then use -v to go back to
info level at the CLI. So I kept the option.
* the `logging.level: debug` from the default configuration also
doesn nothing now, unless you also specify some selectors. So I added
also a `logging.selectors: ["*"]` to the default short config.
* While not all of elastic#1931 is done yet, I think we can proceed with the
default level change to fix elastic#1734 and then we do more INFO/WARN cleanups
in future PRs.
ruflin pushed a commit that referenced this issue Jul 4, 2016
This is part of #1931 and fixes #1734.

Some things to note/discuss:

* `-v` does nothing by default now. However, it's possible that you
set it to error in the config file and then use -v to go back to
info level at the CLI. So I kept the option.
* the `logging.level: debug` from the default configuration also
doesn nothing now, unless you also specify some selectors. So I added
also a `logging.selectors: ["*"]` to the default short config.
* While not all of #1931 is done yet, I think we can proceed with the
default level change to fix #1734 and then we do more INFO/WARN cleanups
in future PRs.
tsg pushed a commit to tsg/beats that referenced this issue Jul 5, 2016
Prints the delta since the last period, if it's non-zero. The idea is
to help with troubleshooting while still creating relatively little output
in the logs. 30s seems like a good compromise.

Part of/related to elastic#1931
ruflin pushed a commit that referenced this issue Jul 5, 2016
Prints the delta since the last period, if it's non-zero. The idea is
to help with troubleshooting while still creating relatively little output
in the logs. 30s seems like a good compromise.

Part of/related to #1931
tsg pushed a commit to tsg/beats that referenced this issue Jul 5, 2016
Part of elastic#1931, this applies the metricbeat conventions to the names
of the expvars. Most of the counters are still top level integers.
tsg pushed a commit to tsg/beats that referenced this issue Jul 5, 2016
Part of elastic#1931, this adds logic to descend into maps. This is useful especially
to support the metricbeat fetches stats. Locking should be safe as the expvar.Map
Do function takes a read lock on the map.

Note that expvars of type Func are still not included, which is why
the memstats are not logged.
andrewkroh pushed a commit that referenced this issue Jul 5, 2016
Part of #1931, this adds logic to descend into maps. This is useful especially
to support the metricbeat fetches stats. Locking should be safe as the expvar.Map
Do function takes a read lock on the map.

Note that expvars of type Func are still not included, which is why
the memstats are not logged.
ruflin pushed a commit that referenced this issue Jul 6, 2016
Part of #1931, this applies the metricbeat conventions to the names
of the expvars. Most of the counters are still top level integers.
tsg pushed a commit to tsg/beats that referenced this issue Jul 6, 2016
Part of elastic#1931. Replaces two of the "periodic" Infos with Debugs + expvars.
ruflin pushed a commit that referenced this issue Jul 6, 2016
Part of #1931. Replaces two of the "periodic" Infos with Debugs + expvars.
tsg pushed a commit to tsg/beats that referenced this issue Jul 7, 2016
Transform a debug in Info for the loadbalancing mode, to make sure
we have at least an Info on any connection failure.

Part of elastic#1931.
urso pushed a commit that referenced this issue Jul 7, 2016
Transform a debug in Info for the loadbalancing mode, to make sure
we have at least an Info on any connection failure.

Part of #1931.
tsg pushed a commit to tsg/beats that referenced this issue Aug 9, 2016
Part of elastic#1931. The default log level was changed to INFO in PR elastic#1952,
but the change wasn't reflected in the default configuration files. This
fixes it. The issue was signaled in elastic#2183.

Also fixes the logging level in the docs.
ruflin pushed a commit that referenced this issue Aug 9, 2016
Part of #1931. The default log level was changed to INFO in PR #1952,
but the change wasn't reflected in the default configuration files. This
fixes it. The issue was signaled in #2183.

Also fixes the logging level in the docs.
@tsg tsg removed the v5.0.0-beta1 label Aug 22, 2016
tsg pushed a commit to tsg/beats that referenced this issue Aug 23, 2016
Replaces the similar functionality from Metricbeat and Winlogbeat,
generalizing it to all Beats. Part of elastic#1931.
ruflin pushed a commit that referenced this issue Aug 23, 2016
Replaces the similar functionality from Metricbeat and Winlogbeat,
generalizing it to all Beats. Part of #1931.
tsg pushed a commit to tsg/beats that referenced this issue Aug 24, 2016
ruflin pushed a commit that referenced this issue Aug 24, 2016
@tsg
Copy link
Contributor Author

tsg commented Aug 24, 2016

All done, now.

@tsg tsg closed this as completed Aug 24, 2016
@tsg tsg removed the in progress Pull request is currently in progress. label Aug 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants