Beats central monitoring Phase 1 #3422

monicasarbu · 2017-01-19T20:39:28Z

When you deploy a large number of Beats, it becomes challenging to monitor the Beats itself.
A solution would be for the Beats to report health status to a collection point, such as Elasticsearch, and visualize it with Kibana.

The following health metrics should be sent to Elasticsearch:

libbeat.beat.cpu_usage
libbeat.beat.memory_usage
libbeat.publisher.published_events
libbeat.publisher.messages_in_worker_queues
libbeat.outputs.acked_events Export health metrics via expvar #3423
libbeat.outputs.messages_dropped
libbeat.outputs.send_bytes Export health metrics via expvar #3423
libbeat.outputs.failures Export health metrics via expvar #3423

Each Beat exports more metrics via expvar, but it should send only a subset of these metrics to Elasticsearch.

By default, the health metrics are sent directly to the Elasticsearch cluster configured in the outputs.elasticsearch, but you can also configure an extra Elasticsearch cluster to send the monitoring data to.

TODO:

Differentiate between the metrics exported via expvar to send only a subset
Send the health metrics to Elasticsearch

Configuration:

monitoring:
    enabled: true
    period: 10s
    elasticsearch: ["localhost:9201"]

UPDATE: The CPU usage is exported under different fields. See #3422

cc-ed @bohyun-e @brandonmensing

The text was updated successfully, but these errors were encountered:

ruflin · 2017-01-20T05:37:48Z

As reference, here is the old issue where this all started: #463

bohyun-e · 2017-01-20T08:43:03Z

enabled: true

Would the default value be true here? or false?

period: 10s

I'm not 100% sure what happens when you have the collection period that is different from the ES monitoring collection interval. But reading from the doc, my gut feeling is that whatever ES's collection interval is - should be applied in other products, such as Kibana Monitoring collection interval. I'm guessing it would be the same for Beats, but it'd be a good idea to confirm.

uboness · 2017-01-20T14:21:26Z

I'm not 100% sure what happens when you have the collection period that is different from the ES monitoring collection interval. But reading from the doc, my gut feeling is that whatever ES's collection interval is - should be applied in other products, such as Kibana Monitoring collection interval. I'm guessing it would be the same for Beats, but it'd be a good idea to confirm.

I don't think we should have this restriction (that all collection intervals are equal). I also don't know how we can even enforce it.

Different systems may need different intervals, and the monitoring UI should deal with that.

lswith · 2017-01-24T00:57:50Z

It would be extremely nice to simply have one more commandline option to turn on only expvar variables, rather than the httpprof commandline option. This would allow other tools to scrape each beat type on their own interval.

bohyun-e · 2017-01-25T19:22:44Z

cc: @pickypg @tsullivan @skearns64

valentin-fischer · 2017-08-22T08:45:47Z

Hi,

Any progress on this ? Is there a way to export this as json and not send everything to elasticsearch ?

Thank you!

jeremydonahue · 2017-09-05T20:01:01Z

+1 for a progress update.

It would be extremely nice to simply have one more commandline option to turn on only expvar variables, rather than the httpprof commandline option. This would allow other tools to scrape each beat type on their own interval.

+1

It would also be very nice to support outputs other than elasticsearch. Ideally any of the already supported outputs (eg. Kafka, Redis, Logstash, etc.) would work:

monitoring:
    enabled: true
    period: 10s
    output.elasticsearch: 
        hosts: ["localhost:9201"]
        ...
    output.kafka:
        hosts: ["localhost:9092"]
        ...

Thanks!

monicasarbu · 2017-09-05T20:13:15Z

Unfortunately, we didn't do much progress here. We are planning to store all the monitoring data to Elasticsearch only. In the first version, we are sending the monitoring data to Elasticsearch, but we are considering sending the data to other supported outputs in the future.

superwhykz · 2017-09-22T03:11:36Z

For kafka it would be nice to send to different topic that can be defined under monitoring struct. We 're heavily scaling filebeat in our infra(18k+) and none of them ships directly to elastic.

trondhindenes · 2017-10-25T20:39:41Z

I'd love it if Beats followed the same model as Logstash and simply exposed a local metrics endpoint. We're struggling with writing robust monitoring for filebeat, as it's very much a "black box" when it comes to state. I'm not sure an Elasticsearch metrics integration would help all that much. Beats logfiles are not geared towards getting to the "current state" of the beat (eg. "is the beat able to ship data to logstash right now?") Enabling a local metrics endpoint ala logstash and expose items such as "queued events" "percent number of dead/alive shipper targets" so that we could pick up that info locally using a monitoring tool/agent such as prometheus or Datadog would be of much higher value to us than getting it in Elasticsearch.

tsg · 2018-01-31T18:13:25Z

@trondhindenes An experimental http endpoint is added in Beats 6.2. See #3717

trondhindenes · 2018-02-01T00:22:06Z

Awesome!

tsg · 2018-02-28T15:04:54Z

We can consider phase 1 completed.

monicasarbu added enhancement libbeat meta labels Jan 19, 2017

This was referenced Jan 19, 2017

Beats central monitoring via Elasticsearch - Phase 1 #463

Closed

Export libbeat.outputs.acked_events via expvar #3411

Closed

Export health metrics via expvar #3423

Closed

monicasarbu mentioned this issue Jan 20, 2017

Export health metrics #3432

Merged

36 tasks

ruflin mentioned this issue Mar 20, 2017

Running Metricbeat in kubernetes elastic/beats-docker#3

Closed

ruflin mentioned this issue Aug 18, 2017

Provide apm-server status elastic/apm-server#68

Closed

ruflin mentioned this issue Sep 1, 2017

Push onboarding doc on startup elastic/apm-server#117

Merged

tsg closed this as completed Feb 28, 2018

lucabelluccini mentioned this issue Oct 24, 2018

Enhancement - Metricbeat is not able to run if it has no write access to path.data #8731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beats central monitoring Phase 1 #3422

Beats central monitoring Phase 1 #3422

monicasarbu commented Jan 19, 2017 •

edited

Loading

ruflin commented Jan 20, 2017

bohyun-e commented Jan 20, 2017

uboness commented Jan 20, 2017

lswith commented Jan 24, 2017

bohyun-e commented Jan 25, 2017

valentin-fischer commented Aug 22, 2017

jeremydonahue commented Sep 5, 2017

monicasarbu commented Sep 5, 2017

superwhykz commented Sep 22, 2017

trondhindenes commented Oct 25, 2017

tsg commented Jan 31, 2018

trondhindenes commented Feb 1, 2018

tsg commented Feb 28, 2018

Beats central monitoring Phase 1 #3422

Beats central monitoring Phase 1 #3422

Comments

monicasarbu commented Jan 19, 2017 • edited Loading

ruflin commented Jan 20, 2017

bohyun-e commented Jan 20, 2017

uboness commented Jan 20, 2017

lswith commented Jan 24, 2017

bohyun-e commented Jan 25, 2017

valentin-fischer commented Aug 22, 2017

jeremydonahue commented Sep 5, 2017

monicasarbu commented Sep 5, 2017

superwhykz commented Sep 22, 2017

trondhindenes commented Oct 25, 2017

tsg commented Jan 31, 2018

trondhindenes commented Feb 1, 2018

tsg commented Feb 28, 2018

monicasarbu commented Jan 19, 2017 •

edited

Loading