Skip to content
This repository has been archived by the owner on Dec 1, 2018. It is now read-only.

Heapster long term vision #769

Merged
merged 1 commit into from
Feb 2, 2016
Merged

Conversation

mwielgus
Copy link
Contributor

@mwielgus mwielgus commented Dec 8, 2015

No description provided.

@piosz
Copy link
Contributor

piosz commented Dec 8, 2015

with 1h resolution), keeps them in memory and exposes via Heapster API. This API is mainly
used by Horizontal Pod Autoscaler which asks for the most recent performance related
metrics to adjust the number of pods to the incoming traffic. The API is also used by KubeDash
(which unfortunately didn’t get enough traction and will be replaced) and will be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is vague. We as kube developers haven't put enough energy into making kubedash primetime ready. Why throw away a good piece of software, when no alternative exists?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I would just remove the bit about KubeDash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can rephrase this sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not my decision to start yet another UI. But this is the situation right now. Kubernetes Dashboard is actively developed (https://github.com/kubernetes/dashboard/graphs/contributors) by Google and Fujitsu, has some working prototype and will probably be delivered for 1.2 to some extent. On the other hand there is KubeDash that is "stable" from early October. It has some very specific requirements, like 1-day-log cpu usage average, which may or may not be relevant once Kubernetes Dashboard becomes the default/main Kubernetes user interface.

@piosz piosz assigned fgrzadkowski and unassigned piosz Dec 8, 2015

* [UC1] Read metrics from nodes and write them to an external storage.
* [UC2] Expose metrics from the last 2-3 minutes (for HPA and GKE)
* [UC3] Read Events from the API server and write them to a permanent storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason that event storage is part of heapster rather than a separate tool to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see the Eventer below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, right now metrics and events are combined into one tool but we are planning to split them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original plan was to combine events and metrics data and build more interesting signals for end users. I guess even if we split heapster into separate binaries, if we want to build such models, we will have to combine the data somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK there are no immediate plans to for any event/metrics combining. Once we decide to do it we can revisit this item and decide what is the best:

  • having a separate component
  • having a sink that also listens to Kubernetes events
  • glueing the two binaries again (unlikely)

If you want to discuss this now please schedule a VC.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

events are too heavy imo, I would vote to not try to amalgamate, but leave that to back-end systems to ETL and learn from operational data.

* [UC1] Read metrics from nodes and write them to an external storage.
* [UC2] Expose metrics from the last 2-3 minutes (for HPA and GKE)
* [UC3] Read Events from the API server and write them to a permanent storage
* [UC4] Do some long-term (hours, days) metrics analysis to get stats (average, 95 percentile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "max" value is also important here, because it lets you see if there was any activity at all (not particularly useful for things like CPU and memory, but for net, or certain custom metrics like hits per second, it could be used to determine if the pod was useful)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, max, avg and 95%ile is made available for last minute, hour and day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, max will be there too.

@piosz
Copy link
Contributor

piosz commented Dec 10, 2015

cc @bryk

etc. present in the system.

There is also a HeapsterGKE API dedicated for GKE through which it’s possible to get a full
dump of all metrics (spanning last minute or two).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last two minutes as of now to be specific.

@bryk
Copy link

bryk commented Dec 14, 2015

CC @joeatwork

@k8s-bot
Copy link

k8s-bot commented Dec 14, 2015

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain to ixdy.

@ncdc
Copy link

ncdc commented Dec 17, 2015

cc @kubernetes/rh-cluster-infra @jeremyeder @timothysc @smarterclayton @mwringe

@ncdc
Copy link

ncdc commented Dec 17, 2015

cc @kubernetes/rh-scalability

with 1h resolution), keeps them in memory and exposes via Heapster API. This API is mainly
used by Horizontal Pod Autoscaler which asks for the most recent performance related
metrics to adjust the number of pods to the incoming traffic. The API is also used by KubeDash
and will be used by the new UI (which will replace KubeDash) as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a link to an issue that tracks the replacement?
We should place that issue in the kubedash repo as well to make it clear for the existing kubedash users.


## Custom Metrics Status

Heapster is not a generic solution for gathering arbitrary number of arbitrary-formated custom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? What if custom metrics are required for the GKE pipeline in the near future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we cannot scale that much to say that we can take arbitrary number of custom metrics. 100+ metrics per pod with 1000 nodes and 30 pods on each node will probably require sharded/clustered Heapster for which we will likely not get "time budget" anytime soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd likely get more mileage out of contributing to existing projects, e.g. Prometheus, if this became a requirement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we assume that none of those custom-metrics are cached, do we still expect to have a huge resource impact to proxy metrics over to a sink?
@jimmidyson: What you suggest is an alternative for sure. Excepting a simple aggregation, I'm not suggesting any additional features. Prometheus is a monitoring system by itself, whereas heapster is only an aggregation agent.

@mwielgus mwielgus mentioned this pull request Dec 23, 2015
(with support for CoreOS Fleet and flat file node lists).

Metrics collected by Heapster can be written into multiple kinds of storage - Influxdb,
OpenTSDB, Google Cloud Monitoring, Hawkular, Kaflka, Riemann, ElasticSearch (some of them are

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Kaflka/Kafka

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@timothysc
Copy link

re: events, this overlaps a lot with work folks want to do to enable direct sharding of data to kafka.

kubernetes/kubernetes#19637

@mwielgus
Copy link
Contributor Author

mwielgus commented Feb 2, 2016

re: kafka, thanks for the heads up.

@mwielgus
Copy link
Contributor Author

mwielgus commented Feb 2, 2016

Merging the proposal as is. Most of the proposal is already implemented in heapster-scalability branch. For other stuff I'm happy to set up different issues/document or have a vc. Please let me know if you feel a strong need to discuss a particular case.

@k8s-bot
Copy link

k8s-bot commented Feb 2, 2016

Jenkins GCE e2e

Build/test passed for commit c0d82aa.

mwielgus added a commit that referenced this pull request Feb 2, 2016
Heapster long term vision
@mwielgus mwielgus merged commit 0e06b6b into kubernetes-retired:master Feb 2, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.