Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging: Startup messages log level #1854

Closed
bndw opened this issue Oct 13, 2017 · 14 comments
Closed

Logging: Startup messages log level #1854

bndw opened this issue Oct 13, 2017 · 14 comments
Labels
beginner Good starter issues! help wanted Needs help! tech debt

Comments

@bndw
Copy link
Contributor

bndw commented Oct 13, 2017

Startup messages are currently logged at a warning level, but should probably be at a debug or info level.

[2017-10-13 18:50:51.571][6][warning][main] source/server/server.cc:164] initializing epoch 0 (hot restart version=9.200.16384.127)
[2017-10-13 18:50:51.587][6][warning][main] source/server/server.cc:332] starting main dispatch loop
[2017-10-13 18:50:51.587][6][warning][main] source/server/server.cc:316] all clusters initialized. initializing init manager
[2017-10-13 18:50:51.587][6][warning][config] source/server/listener_manager_impl.cc:448] all dependencies initialized. starting workers

Simply changing their level will likely hide them by default and that is probably undesirable (these messages should probably always be printed).

@mattklein123
Copy link
Member

Most of these messages used to be "notice" level before spdlog removed that level and we switched them all to warn.

@mattklein123 mattklein123 added help wanted Needs help! beginner Good starter issues! labels Oct 28, 2017
@jmarantz
Copy link
Contributor

jmarantz commented Nov 7, 2017

Hi... Was interested in trying an easy task as an intro to envoy. This one looks like it might be appropriate. What's the thinking on strategy here? Make the startup messages be 'info' and have info messages visible during startup by default and and hidden afterward? Other ideas?

@mattklein123
Copy link
Member

@jmarantz my thinking on this is:

  1. Audit all messages that are warnings. Move things to info that aren't really warnings.
  2. Audit all messages that are info. (There actually aren't that many I don't think). Move anything that prints regularly (every connection, on a periodic schedule, etc.) to debug.
  3. Switch default log level to info.

@jmarantz
Copy link
Contributor

jmarantz commented Nov 7, 2017

Got it.I will scope this out. It seems simple enough but likely touches a bunch of files. If the messages can be reasonably judged on text alone, context-free, it may be straightforward enough, and would take me on a nickel your through the code.

@jmarantz
Copy link
Contributor

jmarantz commented Nov 7, 2017

There appear to 37 info messages and 31 warnings. Default loglevel is established in ./exe/main.cc:45 I believe. Looks like a good intro task.

@jmarantz
Copy link
Contributor

jmarantz commented Nov 7, 2017

11 references to ListenerImpl::infoLog as well.

There are a handful of info messages that appear to be startup-time only. Leaving those as is, assuming a little startup noise is OK, but that we are trying to avoid a lot of per-connection logging.

I also found 3 instances of this:

  if (e) {
    ENVOY_LOG(warn, "cds: fetch failure: {}", e->what());
  } else {
    ENVOY_LOG(info, "cds: fetch failure: network error");
  }

I'm guessing this was done as a deliberate attempt to reduce 'network error' noise going into the logs regularly. Is that right? In that case I think the 'network' error path should be changed to debug.

@mattklein123
Copy link
Member

@jmarantz yup all of ^ LGTM.

@jmarantz
Copy link
Contributor

jmarantz commented Nov 7, 2017

Great. I have a change which passes tests, I think, but I would like to see a server under load and measure impact on log size. What's the best way to do that?

@mattklein123
Copy link
Member

@jmarantz not sure, you could try running integration tests at info level potentially and see what prints. You could also try out https://github.com/envoyproxy/envoy-perf.

@mattklein123
Copy link
Member

(In general there should be almost no logging at info level once the server starts up and is forwarding traffic).

@jmarantz
Copy link
Contributor

jmarantz commented Nov 9, 2017

Sorry for the delay -- had some other things going on, and needed to do a bit of learning to understand how to run integration tests. Those exposed some other potentially noisy info logs, some of which were obvious and I fixed. But in testlogs/test/integration/hotrestart_test/test.log I have a case I'm not sure how it's best to handle.

I have lots of log entries like this:

[2017-11-09 16:20:48.257][53716][info][upstream] source/common/upstream/cluster_manager_impl.cc:62] cm init: adding: cluster=cluster_2 primary=0 secondary=0

and

[2017-11-09 16:20:48.284][53716][info][config] source/server/configuration_impl.cc:47] listener #4:
[2017-11-09 16:20:48.287][53716][info][config] source/server/listener_manager_impl.cc:128] add active listener: name=d952e25f-514a-414b-9045-eca0021a3142, hash=14774254519635138978, address=127.0.0.1:0

A burst of a few few dozen of those on startup seem OK. But in this test there 8 such bursts of several dozen. I think this is the nature of the test (hot restart) but I thought I'd just note this concern and see whether these config-type messages should be moved to debug.

@mattklein123
Copy link
Member

@jmarantz yeah I would just move all those to debug.

@jmarantz
Copy link
Contributor

jmarantz commented Nov 9, 2017

Got rid of those, but these still remain. From running all integration tests, I then ran this to look at log noise:

  find . -name '*.log' -exec fgrep \[info\] {} \; >/tmp/info.log
  cut -f4-10 -d\] /tmp/info.log|cut -d\  -f1-2 | grep -v  test/ | sort | uniq -c | sort -n -r

Basically this pipe shows me the sources of messages that come out most frequently in the integration tests. Full results below. I think these seem reasonable but happy to trim further. The remaining high frequency messages all seem to be per server startup or config, not per request:

    404 [upstream] source/common/upstream/cluster_manager_impl.cc:131]
    404 [config] source/server/configuration_impl.cc:83]
    404 [config] source/server/configuration_impl.cc:45]
    404 [config] source/server/configuration_impl.cc:109]
    402 [main] source/server/server.cc:369]
    402 [main] source/server/server.cc:348]
    402 [main] source/server/server.cc:344]
    402 [main] source/server/server.cc:328]
    402 [main] source/server/server.cc:303]
    402 [main] source/server/server.cc:157]
    402 [config] source/server/listener_manager_impl.cc:448]
    396 [main] source/server/server.cc:376]
     22 [upstream] source/common/upstream/cluster_manager_impl.cc:127]
     22 [main] source/server/server.cc:271]
     22 [main] source/server/server.cc:267]
     22 [main] source/server/server.cc:266]
     16 [upstream] source/server/lds_subscription.cc:70]
     16 [main] source/server/server.cc:288]
     16 [client] source/common/http/codec_client.cc:104]
     12 [upstream] source/common/upstream/cluster_manager_impl.cc:105]
     10 [upstream] source/server/lds_api.cc:50]
     10 [upstream] source/common/upstream/cluster_manager_impl.cc:291]
     10 [upstream] source/common/upstream/cds_api_impl.cc:50]
      4 [main] source/server/server.cc:91]
      4 [main] source/server/server.cc:386]
      4 [main] source/server/server.cc:381]
      4 [main] source/server/server.cc:310]
      4 [main] source/server/hot_restart_impl.cc:425]
      2 [upstream] source/server/lds_api.cc:58]
      2 [upstream] source/common/upstream/cluster_manager_impl.cc:320]
      2 [upstream] source/common/upstream/cds_api_impl.cc:57]
      2 [main] source/server/server.cc:315]

@mattklein123
Copy link
Member

@jmarantz good stuff. I would say at this point just post a PR. We can review there / people can run locally if they want to check it out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginner Good starter issues! help wanted Needs help! tech debt
Projects
None yet
Development

No branches or pull requests

3 participants