Proposal: Introduce Oldtimer #1124

DirectXMan12 · 2016-04-11T19:35:57Z

This is a proposal for Oldtimer, the Heapster historical metrics
access component. Oldtimer was original proposed in the vision
statement, but was not specified in any particular detail previously.

k8s-bot · 2016-04-11T19:37:00Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in hack/jenkins/job-configs/kubernetes-jenkins-pull/ instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

ncdc · 2016-04-13T15:30:13Z

docs/proposals/old-timer.md

+1. Using aggregated historical data to make size-related decisions
+   (for example, idling requires looking for traffic over a long time period)
+
+2. Providing a common interface to for users to view historical metrics


remove the "to" after interface

piosz · 2016-04-14T13:24:45Z

cc @fgrzadkowski

fabxc · 2016-04-18T14:46:16Z

As hinted in #1129 I have my concerns about service-level metrics not being clearly separated. Or better, Heapster not holding up to its intent to not be a general purpose monitoring system.

This was previously touched in #665 and I don't feel it ever actually happened. I'm worried that adding a "generic" read-path at this point will lead to Heapster eventually growing a full meta-QL for all its sinks. If it's trying to be the system through which writes happen, it just makes sense to read through it as well.

Aside from the scope-issue, this has many semantical issues as different sinks have different evaluation behaviors (e.g. extrapolation), which can have drastic effects.
Also all these sinks have different query languages. I see a lot of time being spent working on mapping features onto each other and being a general contention point. You touched that point already at the end of your document.

I would be very interested in the use case for this and the benefit within the next 6 months.
As I've heard there are companies with very large-scale scheduling and auto-scaling that hardly had a need for long-term data.
My instinct tells me to defer such features until its absolutely clear what's required.

DirectXMan12 · 2016-04-18T17:58:19Z

@fabxc

I would be very interested in the use case for this and the benefit within the next 6 months.
As I've heard there are companies with very elaborate scheduling and auto-scaling that hardly had a need for long-term data.
My instinct tells me to defer such features until its absolutely clear what's required.

Here's an example use case (in fact, this is an actual thing that we'd like to be able to do):

You would like to write an auto-idler component for Kubernetes. This component would run at some sort of long-term regular interval (say, for instance, every night). Then, it would, for some set of scalables (e.g. RCs, deployments, etc), check to see if there had been any activity on a particular metric (e.g. network activity or hits on an HTTP server) over the past 24h by using oldtimer. Then, if they had no traffic, you could idle them, and set up an auto-unidler.

Without oldtimer, you could build this component to talk directly to one particular metrics sink, but that would mean that you could not upstream the component, since it would not work with just any Kubernetes deployment. Alternatively, we could keep longer-term aggregated metrics in Heapster, but the Heapster vision doc specifically talks about moving such functionality to a different component (which I think is valuable, for reasons that I talked about a bit in the proposal).

I'm worried that adding a "generic" read-path at this point will lead to Heapster eventually growing a full meta-QL for all its sinks. If it's trying to be the system through which writes happen, it just makes sense to read through it as well.

I really do not want a full query language for Heapster. This proposal is about a minimal set of aggregations that would be useful for writing Kubernetes components that deal with historical data (e.g. the idling controller talked about above).

Also all these sinks have different query languages. I see a lot of time being spent working on mapping features onto each other and being a general contention point. You touched that point already at the end of your document.

So, hopefully this came across in the proposal, but if not I can make it clearer there: This should basically represent the lowest-common-denominator amongst the sinks. There should not be any "this sink does not support feature X, so fake it in the Oldtimer code". I did spend some time looking at what the different sinks are capable of, but let me know if I missed anything major.

DirectXMan12 · 2016-04-18T18:35:04Z

@fabxc you can also look at the original discussion around the Heapster long-term vision document (#769) to see some other proposed use cases for Oldtimer.

mwielgus · 2016-04-18T18:50:59Z

docs/proposals/old-timer.md

+
+```go
+type MetricAggregationResult struct {
+    Average *MetricResult


Is MetricResult the best structure to pass this data?

Would you consider having multiple "MetricAggregationResult" and then a single data point in average, max, min, median, count etc?

I'm definitely open to changes here. Were you thinking something like:

type MetricAggregationResultList struct { Items []MetricAggregationResult } type MetricAggregationResult struct { Average *MetricPoint Maximum *MetricPoint ... }

That seems reasonable to me.

Whoops, that last comment needs bucket support:

type MetricAggregationResultList struct { Items []MetricAggregationResult } type MetricAggregationResult struct { Buckets []MetricAggregationBucket // this below isn't strictly necessary, but puts the result into context nicely BucketSize time.Duration } type MetricAggregationBucket struct { Average *MetricPoint Maximum *MetricPoint ... }

Alternatively, the timestamp could be moved into the MetricAggregationBucket type,
and have each field just be a numeric value.

Yeah, something like that. Although I would introduce start/end time to MetricAggregationBucket. The question is also whether you want to name all the aggregations upfront or have a map[string]MetricResult (I don't have a super strong opinon on that).

Although I would introduce start/end time to MetricAggregationBucket

AFAICT, Influx doesn't actually provide an end time -- it just has a timestamp (otherwise I would have).

piosz · 2016-04-19T08:39:27Z

@bryk @kubernetes/dashboard-maintainers this is something you are interested in. PTAL

bryk · 2016-04-19T08:48:20Z

@piosz @DirectXMan12 I'll review this on behalf of Dashboard UI team.

cheld · 2016-04-19T12:24:45Z

CC @taimir

bryk · 2016-04-19T12:25:56Z

docs/proposals/old-timer.md

+Prior to the Heapster refactor, the Heapster model presented aggregations of
+metrics over certain time periods (the last hour and day).  Post-refactor, the
+concern of presenting an interface for historical metrics was to be split into
+a separate Heapster component: Oldtimer.


Will this be built into Heapster? Or maybe it'll be running in a separate Pod/Replication Controller/Service?

So, you'd probably want to run it separately from Heapster so that you could scale it, but we might want to also provide an easy option to run it as part of Heapster as well (some sort of "allinone" installation for quickly getting up and running).

I would suggest to include it in the default deployment. I think that for 99% deployments a single instance will be enough. It should rather not consume lots of memory.
Anyone who needs to perform large-scale data analysis will do it in some other way (not via this very basic api).

Sounds good to me.

bryk · 2016-04-19T12:33:14Z

Looks nice. A few questions.

DirectXMan12 · 2016-04-19T18:41:03Z

Ok, I've tried to address all of the comments. I've added in an example aggregation query, and made a few more things more explicit, and rearranged the proposal a bit. Please let me know if I've missed anything ;-)

bryk · 2016-04-21T06:18:00Z

LGTM from Dashboard UI perspective

mwielgus · 2016-04-21T08:28:58Z

docs/proposals/old-timer.md

+concern of presenting an interface for historical metrics was to be split into
+a separate Heapster component: Oldtimer.
+
+Oldtimer will run as part of the main Heapster executable, and will present


Oldtimer should rather run as a separate executable/container in the main Heapster pod.

ah, ok, I think I misunderstood you above.

That would effectively mean that it would have to be on a different port. Why not just allow it to run in the same executable, if you're going to be running it as part of the same pod anyway, so that https://heapster.kube-system/api/v1/model is for the model, while https://heapster.kube-system/api/v1/historical is for oldtimer?

DirectXMan12 · 2016-04-21T19:57:32Z

Had to tweak the paths a bit so that go-restful could work with them (it turns out go-restful doesn't like wildcards in the middle of a path, and it's probably not the best idea as far as ambiguity is concerned, anyway)

piosz · 2016-04-25T15:30:44Z

docs/proposals/old-timer.md

+Several different aggregations will be supported.  Aggregations should be
+performed in the metrics sink.  If more aggregations later become supported
+across all metrics sinks, the list can be expanded (and the API version
+should probably be bumped, since the supported aggregations should be part of


Please note that we actually don't have versioned API in Heapster. This a kind of promise that we will try to support it in some backward-compatible way but there is no guarantee. If you really want to have a versioned API it should be a part of Resource Metrics API kubernetes/kubernetes#24253

I'll tweak the wording here. I figured since there was an API version in the path, the intention was that you'd have a versioned API at some point.

There will be another versioned API of Kubernetes standards (proposal linked in previous comment). Especially if you want model/historical api to be fully versioned they should go through the path: alpha -> beta -> stable. We don't want it here to allow higher velocity at cost of possible instability.

piosz · 2016-04-26T05:34:59Z

Let's imagine the situation where we created a pod with some name, it was deleted and then let's say a week later we created another pod with the same name which is a totally different pod. What will be the semantic of historical metrics in this case? Do you plan to verify also pod-uid somehow?

DirectXMan12 · 2016-04-26T21:12:55Z

What will be the semantic of historical metrics in this case? Do you plan to verify also pod-uid somehow?

Good point. We probably want /pod-ids/{pod_id} and /pod-ids/{pod_id}/containers/{container} API endpoints. Keeping the namespace-pod API endpoint as well might be nice, though, since it mirrors the model API.

piosz · 2016-04-27T05:05:07Z

But operating on pod uids is a bad user experience.

I'm ok with saying (possibly in the first version): if you have two pods with the same name historically the data will be mixed. Better approach is to return data only from the newest pod with the given name. For example I created on Monday pod with name my-name and then killed it on Tuesday. I created also a pod with the same name on Friday and it is still running. When querying for historical data for the whole week I'd like to get only metrics for the second pod, so there won't be any data for Mon-Thu. WDYT?

DirectXMan12 · 2016-04-27T15:32:28Z

Better approach is to return data only from the newest pod with the given name

This gets a bit complicated, since effectively we have to then know the appropriate pod uid for a pod name before making a query, so we have to make two queries against the database. That might not be so bad, but it does adds another round trip for every query.

If we're ok with the extra round trip, the optimal setup would probably be that you can refer to pod UID to get a specific pod, or refer to pod name to get the newest pod with that name.

DirectXMan12 · 2016-04-28T21:01:18Z

I've added in the pod-id-based endpoints, and added a note about the pod-name-based endpoints. I think there are some cases for certain backends in which we can get away with only one trip to the backend, but there are several cases where two will be required.

For instance, to get raw metrics from Hawkular, you need to know the pod id, AFAICT (@mwringe et all might be able to correct me on this). For aggregations, I'm not sure how Hawkular deals with a query which could return multiple time series, but I suspect we'll need it there as well.

This is a proposal for Oldtimer, the Heapster historical metrics access component. Oldtimer was original proposed in the vision statement, but was not specified in any particular detail previously.

DirectXMan12 · 2016-04-29T17:35:24Z

I've also clarified a couple points for posterity's sake:

Oldtimer is intended to work only with data written in the Heapster storage schema (i.e. data originally written by Heapster)
Oldtimer is not mean to be a general-purpose query tool that maps arbitrary queries to the sink query language -- it's just supposed to support a basic set of aggregations on single time series.

cc @fabxc

I think at this point the proposal is pretty much ready for merge. @mwielgus anything else you want changed?

mwielgus · 2016-04-29T19:51:22Z

LGTM

mwringe · 2016-05-03T15:49:20Z

For instance, to get raw metrics from Hawkular, you need to know the pod id, AFAICT (@mwringe et all might be able to correct me on this). For aggregations, I'm not sure how Hawkular deals with a query which could return multiple time series, but I suspect we'll need it there as well.

Hmm, you shouldn't need to know the pod id to get the raw data, you should be able to do a query based on any of the labels. But there may have been an issue preventing this in the past.

Can you get in touch with Stefan Negrea (he is stefan_n in #hawkular on freenode) to answer your questions?

googlebot added the cla: yes label Apr 11, 2016

ncdc reviewed Apr 13, 2016
View reviewed changes

DirectXMan12 force-pushed the proposal/old-timer branch from 2cd3494 to 7eb5ce2 Compare April 13, 2016 17:50

piosz assigned mwielgus Apr 14, 2016

mwielgus reviewed Apr 18, 2016
View reviewed changes

bryk reviewed Apr 19, 2016
View reviewed changes

DirectXMan12 force-pushed the proposal/old-timer branch from 7eb5ce2 to 198ab5f Compare April 19, 2016 18:40

DirectXMan12 force-pushed the proposal/old-timer branch from 198ab5f to d86d0dd Compare April 19, 2016 18:42

mwielgus reviewed Apr 21, 2016
View reviewed changes

DirectXMan12 force-pushed the proposal/old-timer branch 3 times, most recently from 125774c to c4addee Compare April 21, 2016 20:46

piosz reviewed Apr 25, 2016
View reviewed changes

DirectXMan12 force-pushed the proposal/old-timer branch from c4addee to 99ea598 Compare April 28, 2016 20:57

DirectXMan12 force-pushed the proposal/old-timer branch from 99ea598 to 499384c Compare April 29, 2016 17:26

Proposal: Introduce Oldtimer

499384c

This is a proposal for Oldtimer, the Heapster historical metrics access component. Oldtimer was original proposed in the vision statement, but was not specified in any particular detail previously.

mwielgus added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2016

mwielgus merged commit e42a3c5 into kubernetes-retired:master Apr 29, 2016

DirectXMan12 deleted the proposal/old-timer branch April 29, 2016 19:57

cblecker unassigned mwielgus Nov 30, 2018

Proposal: Introduce Oldtimer #1124

Proposal: Introduce Oldtimer #1124

Conversation

DirectXMan12 commented Apr 11, 2016

k8s-bot commented Apr 11, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piosz commented Apr 14, 2016

fabxc commented Apr 18, 2016 • edited Loading

DirectXMan12 commented Apr 18, 2016 • edited Loading

DirectXMan12 commented Apr 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DirectXMan12 Apr 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piosz commented Apr 19, 2016

bryk commented Apr 19, 2016

cheld commented Apr 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwielgus Apr 19, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryk commented Apr 19, 2016

DirectXMan12 commented Apr 19, 2016 • edited Loading

bryk commented Apr 21, 2016

mwielgus Apr 21, 2016 • edited by piosz Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DirectXMan12 commented Apr 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piosz commented Apr 26, 2016 • edited Loading

DirectXMan12 commented Apr 26, 2016

piosz commented Apr 27, 2016

DirectXMan12 commented Apr 27, 2016

DirectXMan12 commented Apr 28, 2016

DirectXMan12 commented Apr 29, 2016

mwielgus commented Apr 29, 2016

mwringe commented May 3, 2016

fabxc commented Apr 18, 2016 •

edited

Loading

DirectXMan12 commented Apr 18, 2016 •

edited

Loading

DirectXMan12 Apr 18, 2016 •

edited

Loading

mwielgus Apr 19, 2016 •

edited

Loading

DirectXMan12 commented Apr 19, 2016 •

edited

Loading

mwielgus Apr 21, 2016 •

edited by piosz

Loading

piosz commented Apr 26, 2016 •

edited

Loading