-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Proposal: Introduce Oldtimer #1124
Proposal: Introduce Oldtimer #1124
Conversation
Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test". This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
1. Using aggregated historical data to make size-related decisions | ||
(for example, idling requires looking for traffic over a long time period) | ||
|
||
2. Providing a common interface to for users to view historical metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the "to" after interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
2cd3494
to
7eb5ce2
Compare
As hinted in #1129 I have my concerns about service-level metrics not being clearly separated. Or better, Heapster not holding up to its intent to not be a general purpose monitoring system. This was previously touched in #665 and I don't feel it ever actually happened. I'm worried that adding a "generic" read-path at this point will lead to Heapster eventually growing a full meta-QL for all its sinks. If it's trying to be the system through which writes happen, it just makes sense to read through it as well. Aside from the scope-issue, this has many semantical issues as different sinks have different evaluation behaviors (e.g. extrapolation), which can have drastic effects. I would be very interested in the use case for this and the benefit within the next 6 months. |
Here's an example use case (in fact, this is an actual thing that we'd like to be able to do): You would like to write an auto-idler component for Kubernetes. This component would run at some sort of long-term regular interval (say, for instance, every night). Then, it would, for some set of scalables (e.g. RCs, deployments, etc), check to see if there had been any activity on a particular metric (e.g. network activity or hits on an HTTP server) over the past 24h by using oldtimer. Then, if they had no traffic, you could idle them, and set up an auto-unidler. Without oldtimer, you could build this component to talk directly to one particular metrics sink, but that would mean that you could not upstream the component, since it would not work with just any Kubernetes deployment. Alternatively, we could keep longer-term aggregated metrics in Heapster, but the Heapster vision doc specifically talks about moving such functionality to a different component (which I think is valuable, for reasons that I talked about a bit in the proposal).
I really do not want a full query language for Heapster. This proposal is about a minimal set of aggregations that would be useful for writing Kubernetes components that deal with historical data (e.g. the idling controller talked about above).
So, hopefully this came across in the proposal, but if not I can make it clearer there: This should basically represent the lowest-common-denominator amongst the sinks. There should not be any "this sink does not support feature X, so fake it in the Oldtimer code". I did spend some time looking at what the different sinks are capable of, but let me know if I missed anything major. |
|
||
```go | ||
type MetricAggregationResult struct { | ||
Average *MetricResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is MetricResult the best structure to pass this data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider having multiple "MetricAggregationResult" and then a single data point in average, max, min, median, count etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely open to changes here. Were you thinking something like:
type MetricAggregationResultList struct {
Items []MetricAggregationResult
}
type MetricAggregationResult struct {
Average *MetricPoint
Maximum *MetricPoint
...
}
That seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, that last comment needs bucket support:
type MetricAggregationResultList struct {
Items []MetricAggregationResult
}
type MetricAggregationResult struct {
Buckets []MetricAggregationBucket
// this below isn't strictly necessary, but puts the result into context nicely
BucketSize time.Duration
}
type MetricAggregationBucket struct {
Average *MetricPoint
Maximum *MetricPoint
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, the timestamp could be moved into the MetricAggregationBucket
type,
and have each field just be a numeric value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like that. Although I would introduce start/end time to MetricAggregationBucket. The question is also whether you want to name all the aggregations upfront or have a map[string]MetricResult (I don't have a super strong opinon on that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I would introduce start/end time to MetricAggregationBucket
AFAICT, Influx doesn't actually provide an end time -- it just has a timestamp (otherwise I would have).
@bryk @kubernetes/dashboard-maintainers this is something you are interested in. PTAL |
@piosz @DirectXMan12 I'll review this on behalf of Dashboard UI team. |
CC @taimir |
Prior to the Heapster refactor, the Heapster model presented aggregations of | ||
metrics over certain time periods (the last hour and day). Post-refactor, the | ||
concern of presenting an interface for historical metrics was to be split into | ||
a separate Heapster component: Oldtimer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be built into Heapster? Or maybe it'll be running in a separate Pod/Replication Controller/Service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you'd probably want to run it separately from Heapster so that you could scale it, but we might want to also provide an easy option to run it as part of Heapster as well (some sort of "allinone" installation for quickly getting up and running).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to include it in the default deployment. I think that for 99% deployments a single instance will be enough. It should rather not consume lots of memory.
Anyone who needs to perform large-scale data analysis will do it in some other way (not via this very basic api).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Looks nice. A few questions. |
7eb5ce2
to
198ab5f
Compare
Ok, I've tried to address all of the comments. I've added in an example aggregation query, and made a few more things more explicit, and rearranged the proposal a bit. Please let me know if I've missed anything ;-) |
198ab5f
to
d86d0dd
Compare
LGTM from Dashboard UI perspective |
concern of presenting an interface for historical metrics was to be split into | ||
a separate Heapster component: Oldtimer. | ||
|
||
Oldtimer will run as part of the main Heapster executable, and will present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oldtimer should rather run as a separate executable/container in the main Heapster pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok, I think I misunderstood you above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would effectively mean that it would have to be on a different port. Why not just allow it to run in the same executable, if you're going to be running it as part of the same pod anyway, so that https://heapster.kube-system/api/v1/model
is for the model, while https://heapster.kube-system/api/v1/historical
is for oldtimer?
Had to tweak the paths a bit so that go-restful could work with them (it turns out go-restful doesn't like wildcards in the middle of a path, and it's probably not the best idea as far as ambiguity is concerned, anyway) |
125774c
to
c4addee
Compare
Several different aggregations will be supported. Aggregations should be | ||
performed in the metrics sink. If more aggregations later become supported | ||
across all metrics sinks, the list can be expanded (and the API version | ||
should probably be bumped, since the supported aggregations should be part of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that we actually don't have versioned API in Heapster. This a kind of promise that we will try to support it in some backward-compatible way but there is no guarantee. If you really want to have a versioned API it should be a part of Resource Metrics API kubernetes/kubernetes#24253
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll tweak the wording here. I figured since there was an API version in the path, the intention was that you'd have a versioned API at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be another versioned API of Kubernetes standards (proposal linked in previous comment). Especially if you want model/historical api to be fully versioned they should go through the path: alpha -> beta -> stable. We don't want it here to allow higher velocity at cost of possible instability.
Let's imagine the situation where we created a pod with some name, it was deleted and then let's say a week later we created another pod with the same name which is a totally different pod. What will be the semantic of historical metrics in this case? Do you plan to verify also pod-uid somehow? |
Good point. We probably want |
But operating on pod uids is a bad user experience. I'm ok with saying (possibly in the first version): if you have two pods with the same name historically the data will be mixed. Better approach is to return data only from the newest pod with the given name. For example I created on Monday pod with name |
This gets a bit complicated, since effectively we have to then know the appropriate pod uid for a pod name before making a query, so we have to make two queries against the database. That might not be so bad, but it does adds another round trip for every query. If we're ok with the extra round trip, the optimal setup would probably be that you can refer to pod UID to get a specific pod, or refer to pod name to get the newest pod with that name. |
c4addee
to
99ea598
Compare
I've added in the pod-id-based endpoints, and added a note about the pod-name-based endpoints. I think there are some cases for certain backends in which we can get away with only one trip to the backend, but there are several cases where two will be required. For instance, to get raw metrics from Hawkular, you need to know the pod id, AFAICT (@mwringe et all might be able to correct me on this). For aggregations, I'm not sure how Hawkular deals with a query which could return multiple time series, but I suspect we'll need it there as well. |
99ea598
to
499384c
Compare
This is a proposal for Oldtimer, the Heapster historical metrics access component. Oldtimer was original proposed in the vision statement, but was not specified in any particular detail previously.
I've also clarified a couple points for posterity's sake:
cc @fabxc I think at this point the proposal is pretty much ready for merge. @mwielgus anything else you want changed? |
LGTM |
Hmm, you shouldn't need to know the pod id to get the raw data, you should be able to do a query based on any of the labels. But there may have been an issue preventing this in the past. Can you get in touch with Stefan Negrea (he is stefan_n in #hawkular on freenode) to answer your questions? |
This is a proposal for Oldtimer, the Heapster historical metrics
access component. Oldtimer was original proposed in the vision
statement, but was not specified in any particular detail previously.