Proposal: Use Prometheus for scraping & storage #645

jimmidyson · 2015-10-09T18:24:06Z

Right now, Heapster performs all the tasks of scraping, caching, exporting to external storage, & REST API for retrieval.

The same metrics that Heapster retrieves via the /stats endpoint on the kubelet are also exposed for Prometheus to scrape via /metrics. How do you feel about Heapster becoming a query & abstraction layer on top of Prometheus to provide Kubernetes semantics to Prometheus timeseries?

Prometheus is also used in a number of other components: etcd & skydns being the two main core Kubernetes ones I can think of straight off.

I fairly recently added Kubernetes discovery to Prometheus so we have all container metrics ingested, as well as application level metrics (future work for Heapster) thanks to the wide range of language plugins & external metric exporters in the Prometheus ecosystem.

Prometheus has export capabilities the same as Heapster (albeit without a GCM sink atm but that should be easy enough to add in).

Alerting, sharding & federation is already built in to Prometheus.

I could see Heapster becoming a REST API to convert Kubernetes semantic queries to Prometheus queries, with Prometheus providing awesome collection & (configurable) short-term storage of metrics.

Comments?

The text was updated successfully, but these errors were encountered:

vishh · 2015-10-09T18:46:43Z

The same metrics that Heapster retrieves via the /stats endpoint on the kubelet are also exposed for Prometheus to scrape via /metrics. How do you feel about Heapster becoming a query & abstraction layer on top of Prometheus to provide Kubernetes semantics to Prometheus timeseries?

As of now, I think prometheus can be one of the backends for heapster.

Prometheus is also used in a number of other components: etcd & skydns being the two main core Kubernetes ones I can think of straight off.

I assume you are referring to metrics being in prometheus format. AFAIK, that was done because prometheus client libs were the most expressive ones as of now.

I fairly recently added Kubernetes discovery to Prometheus so we have all container metrics ingested, as well as application level metrics (future work for Heapster) thanks to the wide range of language plugins & external metric exporters in the Prometheus ecosystem.

This is great. This will definitely be helpful.

Alerting, sharding & federation is already built in to Prometheus.

Do we have any scalability numbers on prometheus?

Prometheus will be a good solution for monitoring.
But I'm not convinced that we should require all Kubernetes clusters to run prometheus.
We need input from users here. We can send out a survey or discuss this in the weekly hangout.
AFAIK, users love and care about their own monitoring systems and requiring them to run prometheus might not be ideal.

One more concern I have is that in the future I want to have heapster optionally store metrics in a crowd-shared database, and use that data for resource prediction purposes. I haven't gotten the time to flesh out this idea completely.

vishh · 2015-10-09T18:50:05Z

We want kubernetes to have minimal dependencies for bootstrapping purposes.
We will need reliable, low latency access to node resource usage metrics and that is the reason why heapster collects and serves these metrics directly.
Internally, we have never depended on timeseries DB for critical cluster functionalities like scheduling.
In this regard, we can have prometheus serve as the source of non-critical data, which are not time-sensitive.

thucatebay · 2015-10-13T13:12:49Z

Since Prometheus has alerting capability, it'd make for a good out-of-the-box experience. However, as a cluster grows in terms of nodes and pods, it becomes a non-trivial task to operate and scale a metric store such as Prometheus, InfluxDB, Graphite, etc. At eBay we have our own monitoring and alerting system, which we're planning to use for Kubernetes. Heapster fits well in this model since all we have to do is to write a sink. How about making Prometheus the default metric store instead of InfluxDB?

jimmidyson · 2015-10-13T13:39:55Z

@thucatebay Thanks for the feedback! I was thinking of Prometheus in this scenario as a day store, similar to how Heapster operates now, with rules for aggregating metrics as they come in to keep storage & memory requirements low. This would keep the management of it simple but bring with it extra benefits of:

application level metrics (future work for heapster)
persistence (afaik if heapster pod dies you lose all stats which could affect autoscaling)
sharding & federation (future work for heapster)

The only difference for you would be to write an external storage plugin for Prometheus as opposed to a Heapster sink.

It would also mean that those places that don't have their own monitoring & alerting system as you do have would be able to expand the environment by adding in Prometheus alert manager if they wanted, but certainly not a requirement.

vishh · 2015-10-13T17:15:22Z

I agree that Prometheus is a good candidate for better out of the box
monitoring experience.
AFAIK there is at-least one scenario where we will not run Prometheus -
Google Container Engine.

I'd like to split core-cluster functionalities from monitoring. Disabling
monitoring using heapster is totally fine.
But I don't think we can disable collection and processing of core-cluster
metrics that are required to bootstrap the cluster functionalities like
scheduling.
Even in the case of auto-scaling, I'd imagine us wanting to use curated
metrics.

On Tue, Oct 13, 2015 at 6:39 AM, Jimmi Dyson [email protected]
wrote:

@thucatebay https://github.com/thucatebay Thanks for the feedback! I
was thinking of Prometheus in this scenario as a day store, similar to how
Heapster operates now, with rules for aggregating metrics as they come in
to keep storage & memory requirements low. This would keep the management
of it simple but bring with it extra benefits of:

application level metrics (future work for heapster)

persistence (afaik if heapster pod dies you lose all stats which
could affect autoscaling)

sharding & federation (future work for heapster)

The only difference for you would be to write an external storage plugin
for Prometheus as opposed to a Heapster sink.

It would also mean that those places that don't have their own monitoring
& alerting system as you do have would be able to expand the environment by
adding in Prometheus alert manager if they wanted, but certainly not a
requirement.

—
Reply to this email directly or view it on GitHub
#645 (comment)
.

spiffxp · 2015-10-13T20:50:59Z

@vishh I'm a bit confused, are you saying that currently the heapster addon's presence is required for kube-scheduler to be working properly?

vishh · 2015-10-13T22:37:40Z

@spiffxp: Moving forward heapster (standalone) will be run by default on all kubernetes clusters. It will be serving the metrics APIs which will be consumed by the scheduler, auto-scalers, etc.
The term addon is a misnomer because other addons like dns are also required for default kubernetes functionalities.

spiffxp · 2015-10-13T23:03:13Z

@vishh yeah but how about today? is this required for proper functioning of v 1.0.x or 1.1.x?

vishh · 2015-10-13T23:22:52Z

It is not required for v1.0.x.
It is required for beta features in v1.1.x.

On Tue, Oct 13, 2015 at 4:03 PM, Aaron Crickenberger <
[email protected]> wrote:

@vishh https://github.com/vishh yeah but how about today? is this
required for proper functioning of v 1.0.x or 1.1.x?

—
Reply to this email directly or view it on GitHub
#645 (comment)
.

jayunit100 · 2015-10-15T19:46:18Z

There seems to be pretty close coupling to prometheus on the metrics front already. sorta seems like overkill to maintain a separate timeseries framework ? but i see both sides of the coin here.

jimmycuadra · 2017-04-07T01:55:43Z

What's the current state of this? I've read the vision document, but it's still not clear if there is or will be support for Prometheus as a sink for heapster. It's confusing that Prometheus has emerged as the go-to monitoring system for Kubernetes, especially given that it's also a member of the CNCF, and yet when you deploy the cluster monitoring addon for Kubernetes, it uses an InfluxDB sink for Heapster, plus Grafana for visualizations. This means that cluster operators who want metrics with a larger scope than Heapster is intended for must maintain two separate time series databases.

DirectXMan12 · 2017-04-12T15:09:03Z

We're currently transitioning away from Heapster as the defacto solution, as per the new monitoring vision in the community repo. One of the results of that will be an end-to-end setup with Prometheus that does not involve Heapster.

DirectXMan12 · 2017-04-12T15:09:22Z

(in light of that, I'm closing this issue)

davidkarlsen · 2017-04-12T15:39:34Z

@DirectXMan12 "We're currently transitioning away from Heapster as the defacto solution, as per the new monitoring vision in the community repo." - do you have any references /docs for that (I guess it's not the vision doc mentioned above since that one refers to heapster)

DirectXMan12 · 2017-04-12T15:49:52Z

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/monitoring_architecture.md should be what you're looking for.

monotek · 2017-09-22T10:38:05Z

The links is dead :-(
Any mirror available?

I'm currently trying to find out whats the standard / best practice monitoring solution which is used in Kubernetes.

I thought its Cadvisor + Prometheus. Then i've read about Heapster which seems to be dead regarding @davidkarlsen post.

I'm a bit confused now. Where to start?

spiffxp · 2017-09-22T23:32:48Z

try https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/monitoring_architecture.md

kubernetes/community#1010 shuffled around the contents of the design-proposals dir

jimmidyson mentioned this issue Oct 21, 2015

Decide and document the scope of heapster's responsibilities #665

Closed

DirectXMan12 closed this as completed Apr 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Use Prometheus for scraping & storage #645

Proposal: Use Prometheus for scraping & storage #645

jimmidyson commented Oct 9, 2015

vishh commented Oct 9, 2015

vishh commented Oct 9, 2015

thucatebay commented Oct 13, 2015

jimmidyson commented Oct 13, 2015

vishh commented Oct 13, 2015

spiffxp commented Oct 13, 2015

vishh commented Oct 13, 2015

spiffxp commented Oct 13, 2015

vishh commented Oct 13, 2015

jayunit100 commented Oct 15, 2015

jimmycuadra commented Apr 7, 2017

DirectXMan12 commented Apr 12, 2017

DirectXMan12 commented Apr 12, 2017

davidkarlsen commented Apr 12, 2017

DirectXMan12 commented Apr 12, 2017

monotek commented Sep 22, 2017

spiffxp commented Sep 22, 2017