Skip to content

Commit

Permalink
Log non-zero expvars periodically
Browse files Browse the repository at this point in the history
Prints the delta since the last period, if it's non-zero. The idea is
to help with troubleshooting while still creating relatively little output
in the logs. 30s seems like a good compromise.

Part of/related to elastic#1931
  • Loading branch information
Tudor Golubenco committed Jul 5, 2016
1 parent fefa0f3 commit b378ed8
Show file tree
Hide file tree
Showing 11 changed files with 205 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ https://github.com/elastic/beats/compare/v5.0.0-alpha3...master[Check the HEAD d

*Affecting all Beats*

- Periodically log internal metrics. {pull}1955[1955]

*Metricbeat*

*Packetbeat*
Expand Down
8 changes: 8 additions & 0 deletions filebeat/filebeat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -693,6 +693,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, filebeat periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down
8 changes: 8 additions & 0 deletions libbeat/_meta/config.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -479,6 +479,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, beatname periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down
21 changes: 21 additions & 0 deletions libbeat/docs/loggingconfig.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,27 @@ all the debug messages related to event publishing. When starting the Beat,
selectors can be overwritten using the `-d` command line option (`-d` also sets
the debug log level).

===== metrics.enabled

If enabled, {beatname_uc} periodically logs its internal metrics that have
changed in the last period. For each metric that changed, the delta from the
value at the beginning of the period is logged. The default is true.

Here is an example log line:

[source,shell]
----------------------------------------------------------------------------------------------------------------------------------------------------
2016/07/04 12:16:10.821047 logp.go:194: INFO Non-zero metrics in the last 30s: libbeatEsPublishedAndAckedEvents=18 libbeatEsPublishWriteBytes=10045
----------------------------------------------------------------------------------------------------------------------------------------------------

Note that we currently offer no backwards compatible guarantees for the internal
metrics and for this reason they are also not documented.


===== metrics.period

The period after which to log the internal metrics. The default is 30s.

===== files.path

The directory that log files are written to. The default is the logs path. See the
Expand Down
69 changes: 69 additions & 0 deletions libbeat/logp/logp.go
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
package logp

import (
"expvar"
"flag"
"fmt"
"io/ioutil"
"log"
"strconv"
"strings"
"time"

"github.com/elastic/beats/libbeat/paths"
)
Expand All @@ -21,8 +24,18 @@ type Logging struct {
ToSyslog *bool `config:"to_syslog"`
ToFiles *bool `config:"to_files"`
Level string
Metrics LoggingMetricsConfig `config:"metrics"`
}

type LoggingMetricsConfig struct {
Enabled *bool `config:"enabled"`
Period *time.Duration `config:"period" validate:"nonzero,min=0s"`
}

var (
defaultMetricsPeriod = 30 * time.Second
)

func init() {
// Adds logging specific flags: -v, -e and -d.
verbose = flag.Bool("v", false, "Log at INFO level")
Expand Down Expand Up @@ -112,6 +125,8 @@ func Init(name string, config *Logging) error {
log.SetOutput(ioutil.Discard)
}

go logExpvars(&config.Metrics)

return nil
}

Expand Down Expand Up @@ -141,3 +156,57 @@ func getLogLevel(config *Logging) (Priority, error) {
}
return level, nil
}

// snapshotExpvars iterates through all the defined expvars, and for the top
// level vars that are integers it snapshots the name and value in a separate
// map.
func snapshotExpvars(varsMap map[string]int64) {
expvar.Do(func(kv expvar.KeyValue) {
switch kv.Value.(type) {
case *expvar.Int:
varsMap[kv.Key], _ = strconv.ParseInt(kv.Value.String(), 10, 64)
}
})
}

// buildMetricsOutput makes the delta between vals and prevVals and builds
// a printable string with the non-zero deltas.
func buildMetricsOutput(prevVals map[string]int64, vals map[string]int64) string {
metrics := ""
for k, v := range vals {
delta := v - prevVals[k]
if delta != 0 {
metrics = fmt.Sprintf("%s %s=%d", metrics, k, delta)
}
}
return metrics
}

// logExpvars logs at Info level the integer expvars that have changed in the
// last interval. For each expvar, the delta from the beginning of the interval
// is logged.
func logExpvars(metricsCfg *LoggingMetricsConfig) {
if metricsCfg.Enabled != nil && *metricsCfg.Enabled == false {
Info("Metrics logging disabled")
return
}
if metricsCfg.Period == nil {
metricsCfg.Period = &defaultMetricsPeriod
}
Info("Metrics logging every %s", metricsCfg.Period)

ticker := time.NewTicker(*metricsCfg.Period)
prevVals := map[string]int64{}
for {
<-ticker.C
vals := map[string]int64{}
snapshotExpvars(vals)
metrics := buildMetricsOutput(prevVals, vals)
prevVals = vals
if len(metrics) > 0 {
Info("Non-zero metrics in the last %s:%s", metricsCfg.Period, metrics)
} else {
Info("No non-zero metrics in the last %s", metricsCfg.Period)
}
}
}
57 changes: 57 additions & 0 deletions libbeat/logp/logp_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
// +build !integration

package logp

import (
"expvar"
"testing"

"github.com/stretchr/testify/assert"
)

func TestSnapshotExpvars(t *testing.T) {
test := expvar.NewInt("test")
test.Add(42)

vals := map[string]int64{}
snapshotExpvars(vals)

assert.Equal(t, vals["test"], int64(42))
}

func TestBuildMetricsOutput(t *testing.T) {
test := expvar.NewInt("testLog")
test.Add(1)

prevVals := map[string]int64{}
snapshotExpvars(prevVals)

test.Add(5)

vals := map[string]int64{}
snapshotExpvars(vals)

metrics := buildMetricsOutput(prevVals, vals)
assert.Equal(t, " testLog=5", metrics)
prevVals = vals

test.Add(3)
vals = map[string]int64{}
snapshotExpvars(vals)
metrics = buildMetricsOutput(prevVals, vals)
assert.Equal(t, " testLog=3", metrics)
}

func TestBuildMetricsOutputMissing(t *testing.T) {

prevVals := map[string]int64{}
snapshotExpvars(prevVals)

test := expvar.NewInt("testLogEmpty")
test.Add(7)

vals := map[string]int64{}
snapshotExpvars(vals)
metrics := buildMetricsOutput(prevVals, vals)
assert.Equal(t, " testLogEmpty=7", metrics)
}
6 changes: 6 additions & 0 deletions libbeat/tests/system/config/mockbeat.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,10 @@ output:
rotate_every_kb: 1000
#number_of_files: 7

#================================ Logging =====================================

{% if metrics_period -%}
logging.metrics.period: {{ metrics_period }}
{%- endif %}

# vim: set ft=jinja:
10 changes: 10 additions & 0 deletions libbeat/tests/system/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,13 @@ def test_console_output_size_flush(self):
self.wait_until(lambda: self.log_contains("Mockbeat is alive"),
max_timeout=2)
proc.check_kill_and_wait()

def test_logging_metrics(self):
self.render_config_template(
metrics_period="0.1s"
)
proc = self.start_beat(logging_args=["-e"])
self.wait_until(
lambda: self.log_contains("No non-zero metrics in the last 100ms"),
max_timeout=2)
proc.check_kill_and_wait()
8 changes: 8 additions & 0 deletions metricbeat/metricbeat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, metricbeat periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down
8 changes: 8 additions & 0 deletions packetbeat/packetbeat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -863,6 +863,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, packetbeat periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down
8 changes: 8 additions & 0 deletions winlogbeat/winlogbeat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -514,6 +514,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, winlogbeat periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down

0 comments on commit b378ed8

Please sign in to comment.