Skip to content
This repository has been archived by the owner on Dec 1, 2018. It is now read-only.

Proposal: Use Prometheus for scraping & storage #645

Closed
jimmidyson opened this issue Oct 9, 2015 · 17 comments
Closed

Proposal: Use Prometheus for scraping & storage #645

jimmidyson opened this issue Oct 9, 2015 · 17 comments

Comments

@jimmidyson
Copy link
Contributor

Right now, Heapster performs all the tasks of scraping, caching, exporting to external storage, & REST API for retrieval.

The same metrics that Heapster retrieves via the /stats endpoint on the kubelet are also exposed for Prometheus to scrape via /metrics. How do you feel about Heapster becoming a query & abstraction layer on top of Prometheus to provide Kubernetes semantics to Prometheus timeseries?

Prometheus is also used in a number of other components: etcd & skydns being the two main core Kubernetes ones I can think of straight off.

I fairly recently added Kubernetes discovery to Prometheus so we have all container metrics ingested, as well as application level metrics (future work for Heapster) thanks to the wide range of language plugins & external metric exporters in the Prometheus ecosystem.

Prometheus has export capabilities the same as Heapster (albeit without a GCM sink atm but that should be easy enough to add in).

Alerting, sharding & federation is already built in to Prometheus.

I could see Heapster becoming a REST API to convert Kubernetes semantic queries to Prometheus queries, with Prometheus providing awesome collection & (configurable) short-term storage of metrics.

Comments?

@vishh
Copy link
Contributor

vishh commented Oct 9, 2015

The same metrics that Heapster retrieves via the /stats endpoint on the kubelet are also exposed for Prometheus to scrape via /metrics. How do you feel about Heapster becoming a query & abstraction layer on top of Prometheus to provide Kubernetes semantics to Prometheus timeseries?

As of now, I think prometheus can be one of the backends for heapster.

Prometheus is also used in a number of other components: etcd & skydns being the two main core Kubernetes ones I can think of straight off.

I assume you are referring to metrics being in prometheus format. AFAIK, that was done because prometheus client libs were the most expressive ones as of now.

I fairly recently added Kubernetes discovery to Prometheus so we have all container metrics ingested, as well as application level metrics (future work for Heapster) thanks to the wide range of language plugins & external metric exporters in the Prometheus ecosystem.

This is great. This will definitely be helpful.

Alerting, sharding & federation is already built in to Prometheus.

Do we have any scalability numbers on prometheus?

Prometheus will be a good solution for monitoring.
But I'm not convinced that we should require all Kubernetes clusters to run prometheus.
We need input from users here. We can send out a survey or discuss this in the weekly hangout.
AFAIK, users love and care about their own monitoring systems and requiring them to run prometheus might not be ideal.

One more concern I have is that in the future I want to have heapster optionally store metrics in a crowd-shared database, and use that data for resource prediction purposes. I haven't gotten the time to flesh out this idea completely.

@vishh
Copy link
Contributor

vishh commented Oct 9, 2015

We want kubernetes to have minimal dependencies for bootstrapping purposes.
We will need reliable, low latency access to node resource usage metrics and that is the reason why heapster collects and serves these metrics directly.
Internally, we have never depended on timeseries DB for critical cluster functionalities like scheduling.
In this regard, we can have prometheus serve as the source of non-critical data, which are not time-sensitive.

@thucatebay
Copy link
Contributor

Since Prometheus has alerting capability, it'd make for a good out-of-the-box experience. However, as a cluster grows in terms of nodes and pods, it becomes a non-trivial task to operate and scale a metric store such as Prometheus, InfluxDB, Graphite, etc. At eBay we have our own monitoring and alerting system, which we're planning to use for Kubernetes. Heapster fits well in this model since all we have to do is to write a sink. How about making Prometheus the default metric store instead of InfluxDB?

@jimmidyson
Copy link
Contributor Author

@thucatebay Thanks for the feedback! I was thinking of Prometheus in this scenario as a day store, similar to how Heapster operates now, with rules for aggregating metrics as they come in to keep storage & memory requirements low. This would keep the management of it simple but bring with it extra benefits of:

  • application level metrics (future work for heapster)
  • persistence (afaik if heapster pod dies you lose all stats which could affect autoscaling)
  • sharding & federation (future work for heapster)

The only difference for you would be to write an external storage plugin for Prometheus as opposed to a Heapster sink.

It would also mean that those places that don't have their own monitoring & alerting system as you do have would be able to expand the environment by adding in Prometheus alert manager if they wanted, but certainly not a requirement.

@vishh
Copy link
Contributor

vishh commented Oct 13, 2015

I agree that Prometheus is a good candidate for better out of the box
monitoring experience.
AFAIK there is at-least one scenario where we will not run Prometheus -
Google Container Engine.

I'd like to split core-cluster functionalities from monitoring. Disabling
monitoring using heapster is totally fine.
But I don't think we can disable collection and processing of core-cluster
metrics that are required to bootstrap the cluster functionalities like
scheduling.
Even in the case of auto-scaling, I'd imagine us wanting to use curated
metrics.

On Tue, Oct 13, 2015 at 6:39 AM, Jimmi Dyson [email protected]
wrote:

@thucatebay https://github.com/thucatebay Thanks for the feedback! I
was thinking of Prometheus in this scenario as a day store, similar to how
Heapster operates now, with rules for aggregating metrics as they come in
to keep storage & memory requirements low. This would keep the management
of it simple but bring with it extra benefits of:

  • application level metrics (future work for heapster)
  • persistence (afaik if heapster pod dies you lose all stats which
    could affect autoscaling)
  • sharding & federation (future work for heapster)

The only difference for you would be to write an external storage plugin
for Prometheus as opposed to a Heapster sink.

It would also mean that those places that don't have their own monitoring
& alerting system as you do have would be able to expand the environment by
adding in Prometheus alert manager if they wanted, but certainly not a
requirement.


Reply to this email directly or view it on GitHub
#645 (comment)
.

@spiffxp
Copy link
Contributor

spiffxp commented Oct 13, 2015

@vishh I'm a bit confused, are you saying that currently the heapster addon's presence is required for kube-scheduler to be working properly?

@vishh
Copy link
Contributor

vishh commented Oct 13, 2015

@spiffxp: Moving forward heapster (standalone) will be run by default on all kubernetes clusters. It will be serving the metrics APIs which will be consumed by the scheduler, auto-scalers, etc.
The term addon is a misnomer because other addons like dns are also required for default kubernetes functionalities.

@spiffxp
Copy link
Contributor

spiffxp commented Oct 13, 2015

@vishh yeah but how about today? is this required for proper functioning of v 1.0.x or 1.1.x?

@vishh
Copy link
Contributor

vishh commented Oct 13, 2015

It is not required for v1.0.x.
It is required for beta features in v1.1.x.

On Tue, Oct 13, 2015 at 4:03 PM, Aaron Crickenberger <
[email protected]> wrote:

@vishh https://github.com/vishh yeah but how about today? is this
required for proper functioning of v 1.0.x or 1.1.x?


Reply to this email directly or view it on GitHub
#645 (comment)
.

@jayunit100
Copy link

There seems to be pretty close coupling to prometheus on the metrics front already. sorta seems like overkill to maintain a separate timeseries framework ? but i see both sides of the coin here.

@jimmycuadra
Copy link

What's the current state of this? I've read the vision document, but it's still not clear if there is or will be support for Prometheus as a sink for heapster. It's confusing that Prometheus has emerged as the go-to monitoring system for Kubernetes, especially given that it's also a member of the CNCF, and yet when you deploy the cluster monitoring addon for Kubernetes, it uses an InfluxDB sink for Heapster, plus Grafana for visualizations. This means that cluster operators who want metrics with a larger scope than Heapster is intended for must maintain two separate time series databases.

@DirectXMan12
Copy link
Contributor

We're currently transitioning away from Heapster as the defacto solution, as per the new monitoring vision in the community repo. One of the results of that will be an end-to-end setup with Prometheus that does not involve Heapster.

@DirectXMan12
Copy link
Contributor

(in light of that, I'm closing this issue)

@davidkarlsen
Copy link

@DirectXMan12 "We're currently transitioning away from Heapster as the defacto solution, as per the new monitoring vision in the community repo." - do you have any references /docs for that (I guess it's not the vision doc mentioned above since that one refers to heapster)

@DirectXMan12
Copy link
Contributor

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/monitoring_architecture.md should be what you're looking for.

@monotek
Copy link

monotek commented Sep 22, 2017

The links is dead :-(
Any mirror available?

I'm currently trying to find out whats the standard / best practice monitoring solution which is used in Kubernetes.

I thought its Cadvisor + Prometheus. Then i've read about Heapster which seems to be dead regarding @davidkarlsen post.

I'm a bit confused now. Where to start?

@spiffxp
Copy link
Contributor

spiffxp commented Sep 22, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants