-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Conversation
with 1h resolution), keeps them in memory and exposes via Heapster API. This API is mainly | ||
used by Horizontal Pod Autoscaler which asks for the most recent performance related | ||
metrics to adjust the number of pods to the incoming traffic. The API is also used by KubeDash | ||
(which unfortunately didn’t get enough traction and will be replaced) and will be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is vague. We as kube developers haven't put enough energy into making kubedash primetime ready. Why throw away a good piece of software, when no alternative exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I would just remove the bit about KubeDash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I can rephrase this sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not my decision to start yet another UI. But this is the situation right now. Kubernetes Dashboard is actively developed (https://github.com/kubernetes/dashboard/graphs/contributors) by Google and Fujitsu, has some working prototype and will probably be delivered for 1.2 to some extent. On the other hand there is KubeDash that is "stable" from early October. It has some very specific requirements, like 1-day-log cpu usage average, which may or may not be relevant once Kubernetes Dashboard becomes the default/main Kubernetes user interface.
|
||
* [UC1] Read metrics from nodes and write them to an external storage. | ||
* [UC2] Expose metrics from the last 2-3 minutes (for HPA and GKE) | ||
* [UC3] Read Events from the API server and write them to a permanent storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason that event storage is part of heapster rather than a separate tool to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see the Eventer
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, right now metrics and events are combined into one tool but we are planning to split them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original plan was to combine events and metrics data and build more interesting signals for end users. I guess even if we split heapster into separate binaries, if we want to build such models, we will have to combine the data somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK there are no immediate plans to for any event/metrics combining. Once we decide to do it we can revisit this item and decide what is the best:
- having a separate component
- having a sink that also listens to Kubernetes events
- glueing the two binaries again (unlikely)
If you want to discuss this now please schedule a VC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
events are too heavy imo, I would vote to not try to amalgamate, but leave that to back-end systems to ETL and learn from operational data.
* [UC1] Read metrics from nodes and write them to an external storage. | ||
* [UC2] Expose metrics from the last 2-3 minutes (for HPA and GKE) | ||
* [UC3] Read Events from the API server and write them to a permanent storage | ||
* [UC4] Do some long-term (hours, days) metrics analysis to get stats (average, 95 percentile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "max" value is also important here, because it lets you see if there was any activity at all (not particularly useful for things like CPU and memory, but for net, or certain custom metrics like hits per second, it could be used to determine if the pod was useful)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, max, avg and 95%ile is made available for last minute, hour and day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, max will be there too.
cc @bryk |
etc. present in the system. | ||
|
||
There is also a HeapsterGKE API dedicated for GKE through which it’s possible to get a full | ||
dump of all metrics (spanning last minute or two). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last two minutes as of now to be specific.
CC @joeatwork |
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
cc @kubernetes/rh-cluster-infra @jeremyeder @timothysc @smarterclayton @mwringe |
cc @kubernetes/rh-scalability |
with 1h resolution), keeps them in memory and exposes via Heapster API. This API is mainly | ||
used by Horizontal Pod Autoscaler which asks for the most recent performance related | ||
metrics to adjust the number of pods to the incoming traffic. The API is also used by KubeDash | ||
and will be used by the new UI (which will replace KubeDash) as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a link to an issue that tracks the replacement?
We should place that issue in the kubedash repo as well to make it clear for the existing kubedash users.
|
||
## Custom Metrics Status | ||
|
||
Heapster is not a generic solution for gathering arbitrary number of arbitrary-formated custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? What if custom metrics are required for the GKE pipeline in the near future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we cannot scale that much to say that we can take arbitrary number of custom metrics. 100+ metrics per pod with 1000 nodes and 30 pods on each node will probably require sharded/clustered Heapster for which we will likely not get "time budget" anytime soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd likely get more mileage out of contributing to existing projects, e.g. Prometheus, if this became a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we assume that none of those custom-metrics are cached, do we still expect to have a huge resource impact to proxy metrics over to a sink?
@jimmidyson: What you suggest is an alternative for sure. Excepting a simple aggregation, I'm not suggesting any additional features. Prometheus is a monitoring system by itself, whereas heapster is only an aggregation agent.
(with support for CoreOS Fleet and flat file node lists). | ||
|
||
Metrics collected by Heapster can be written into multiple kinds of storage - Influxdb, | ||
OpenTSDB, Google Cloud Monitoring, Hawkular, Kaflka, Riemann, ElasticSearch (some of them are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Kaflka/Kafka
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
re: events, this overlaps a lot with work folks want to do to enable direct sharding of data to kafka. |
re: kafka, thanks for the heads up. |
Merging the proposal as is. Most of the proposal is already implemented in heapster-scalability branch. For other stuff I'm happy to set up different issues/document or have a vc. Please let me know if you feel a strong need to discuss a particular case. |
No description provided.