Skip to content
This repository has been archived by the owner on Dec 1, 2018. It is now read-only.

Proposal: Introduce Oldtimer #1124

Merged
merged 1 commit into from
Apr 29, 2016

Conversation

DirectXMan12
Copy link
Contributor

This is a proposal for Oldtimer, the Heapster historical metrics
access component. Oldtimer was original proposed in the vision
statement, but was not specified in any particular detail previously.

@k8s-bot
Copy link

k8s-bot commented Apr 11, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in hack/jenkins/job-configs/kubernetes-jenkins-pull/ instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

1. Using aggregated historical data to make size-related decisions
(for example, idling requires looking for traffic over a long time period)

2. Providing a common interface to for users to view historical metrics
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the "to" after interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

@piosz
Copy link
Contributor

piosz commented Apr 14, 2016

cc @fgrzadkowski

@fabxc
Copy link

fabxc commented Apr 18, 2016

As hinted in #1129 I have my concerns about service-level metrics not being clearly separated. Or better, Heapster not holding up to its intent to not be a general purpose monitoring system.

This was previously touched in #665 and I don't feel it ever actually happened. I'm worried that adding a "generic" read-path at this point will lead to Heapster eventually growing a full meta-QL for all its sinks. If it's trying to be the system through which writes happen, it just makes sense to read through it as well.

Aside from the scope-issue, this has many semantical issues as different sinks have different evaluation behaviors (e.g. extrapolation), which can have drastic effects.
Also all these sinks have different query languages. I see a lot of time being spent working on mapping features onto each other and being a general contention point. You touched that point already at the end of your document.

I would be very interested in the use case for this and the benefit within the next 6 months.
As I've heard there are companies with very large-scale scheduling and auto-scaling that hardly had a need for long-term data.
My instinct tells me to defer such features until its absolutely clear what's required.

@DirectXMan12
Copy link
Contributor Author

DirectXMan12 commented Apr 18, 2016

@fabxc

I would be very interested in the use case for this and the benefit within the next 6 months.
As I've heard there are companies with very elaborate scheduling and auto-scaling that hardly had a need for long-term data.
My instinct tells me to defer such features until its absolutely clear what's required.

Here's an example use case (in fact, this is an actual thing that we'd like to be able to do):

You would like to write an auto-idler component for Kubernetes. This component would run at some sort of long-term regular interval (say, for instance, every night). Then, it would, for some set of scalables (e.g. RCs, deployments, etc), check to see if there had been any activity on a particular metric (e.g. network activity or hits on an HTTP server) over the past 24h by using oldtimer. Then, if they had no traffic, you could idle them, and set up an auto-unidler.

Without oldtimer, you could build this component to talk directly to one particular metrics sink, but that would mean that you could not upstream the component, since it would not work with just any Kubernetes deployment. Alternatively, we could keep longer-term aggregated metrics in Heapster, but the Heapster vision doc specifically talks about moving such functionality to a different component (which I think is valuable, for reasons that I talked about a bit in the proposal).

I'm worried that adding a "generic" read-path at this point will lead to Heapster eventually growing a full meta-QL for all its sinks. If it's trying to be the system through which writes happen, it just makes sense to read through it as well.

I really do not want a full query language for Heapster. This proposal is about a minimal set of aggregations that would be useful for writing Kubernetes components that deal with historical data (e.g. the idling controller talked about above).

Also all these sinks have different query languages. I see a lot of time being spent working on mapping features onto each other and being a general contention point. You touched that point already at the end of your document.

So, hopefully this came across in the proposal, but if not I can make it clearer there: This should basically represent the lowest-common-denominator amongst the sinks. There should not be any "this sink does not support feature X, so fake it in the Oldtimer code". I did spend some time looking at what the different sinks are capable of, but let me know if I missed anything major.

@DirectXMan12
Copy link
Contributor Author

@fabxc you can also look at the original discussion around the Heapster long-term vision document (#769) to see some other proposed use cases for Oldtimer.


```go
type MetricAggregationResult struct {
Average *MetricResult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MetricResult the best structure to pass this data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider having multiple "MetricAggregationResult" and then a single data point in average, max, min, median, count etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely open to changes here. Were you thinking something like:

type MetricAggregationResultList struct {
    Items []MetricAggregationResult
}

type MetricAggregationResult struct {
    Average *MetricPoint
    Maximum *MetricPoint
    ...
}

That seems reasonable to me.

Copy link
Contributor Author

@DirectXMan12 DirectXMan12 Apr 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, that last comment needs bucket support:

type MetricAggregationResultList struct {
    Items []MetricAggregationResult
}

type MetricAggregationResult struct {
    Buckets []MetricAggregationBucket
    // this below isn't strictly necessary, but puts the result into context nicely
    BucketSize time.Duration
}

type MetricAggregationBucket struct {
    Average *MetricPoint
    Maximum *MetricPoint
    ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, the timestamp could be moved into the MetricAggregationBucket type,
and have each field just be a numeric value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something like that. Although I would introduce start/end time to MetricAggregationBucket. The question is also whether you want to name all the aggregations upfront or have a map[string]MetricResult (I don't have a super strong opinon on that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I would introduce start/end time to MetricAggregationBucket

AFAICT, Influx doesn't actually provide an end time -- it just has a timestamp (otherwise I would have).

@piosz
Copy link
Contributor

piosz commented Apr 19, 2016

@bryk @kubernetes/dashboard-maintainers this is something you are interested in. PTAL

@bryk
Copy link

bryk commented Apr 19, 2016

@piosz @DirectXMan12 I'll review this on behalf of Dashboard UI team.

@cheld
Copy link
Contributor

cheld commented Apr 19, 2016

CC @taimir

Prior to the Heapster refactor, the Heapster model presented aggregations of
metrics over certain time periods (the last hour and day). Post-refactor, the
concern of presenting an interface for historical metrics was to be split into
a separate Heapster component: Oldtimer.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be built into Heapster? Or maybe it'll be running in a separate Pod/Replication Controller/Service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you'd probably want to run it separately from Heapster so that you could scale it, but we might want to also provide an easy option to run it as part of Heapster as well (some sort of "allinone" installation for quickly getting up and running).

Copy link
Contributor

@mwielgus mwielgus Apr 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to include it in the default deployment. I think that for 99% deployments a single instance will be enough. It should rather not consume lots of memory.
Anyone who needs to perform large-scale data analysis will do it in some other way (not via this very basic api).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bryk
Copy link

bryk commented Apr 19, 2016

Looks nice. A few questions.

@DirectXMan12
Copy link
Contributor Author

DirectXMan12 commented Apr 19, 2016

Ok, I've tried to address all of the comments. I've added in an example aggregation query, and made a few more things more explicit, and rearranged the proposal a bit. Please let me know if I've missed anything ;-)

@bryk
Copy link

bryk commented Apr 21, 2016

LGTM from Dashboard UI perspective

concern of presenting an interface for historical metrics was to be split into
a separate Heapster component: Oldtimer.

Oldtimer will run as part of the main Heapster executable, and will present
Copy link
Contributor

@mwielgus mwielgus Apr 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oldtimer should rather run as a separate executable/container in the main Heapster pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok, I think I misunderstood you above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would effectively mean that it would have to be on a different port. Why not just allow it to run in the same executable, if you're going to be running it as part of the same pod anyway, so that https://heapster.kube-system/api/v1/model is for the model, while https://heapster.kube-system/api/v1/historical is for oldtimer?

@DirectXMan12
Copy link
Contributor Author

Had to tweak the paths a bit so that go-restful could work with them (it turns out go-restful doesn't like wildcards in the middle of a path, and it's probably not the best idea as far as ambiguity is concerned, anyway)

@DirectXMan12 DirectXMan12 force-pushed the proposal/old-timer branch 3 times, most recently from 125774c to c4addee Compare April 21, 2016 20:46
Several different aggregations will be supported. Aggregations should be
performed in the metrics sink. If more aggregations later become supported
across all metrics sinks, the list can be expanded (and the API version
should probably be bumped, since the supported aggregations should be part of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that we actually don't have versioned API in Heapster. This a kind of promise that we will try to support it in some backward-compatible way but there is no guarantee. If you really want to have a versioned API it should be a part of Resource Metrics API kubernetes/kubernetes#24253

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll tweak the wording here. I figured since there was an API version in the path, the intention was that you'd have a versioned API at some point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be another versioned API of Kubernetes standards (proposal linked in previous comment). Especially if you want model/historical api to be fully versioned they should go through the path: alpha -> beta -> stable. We don't want it here to allow higher velocity at cost of possible instability.

@piosz
Copy link
Contributor

piosz commented Apr 26, 2016

Let's imagine the situation where we created a pod with some name, it was deleted and then let's say a week later we created another pod with the same name which is a totally different pod. What will be the semantic of historical metrics in this case? Do you plan to verify also pod-uid somehow?

@DirectXMan12
Copy link
Contributor Author

What will be the semantic of historical metrics in this case? Do you plan to verify also pod-uid somehow?

Good point. We probably want /pod-ids/{pod_id} and /pod-ids/{pod_id}/containers/{container} API endpoints. Keeping the namespace-pod API endpoint as well might be nice, though, since it mirrors the model API.

@piosz
Copy link
Contributor

piosz commented Apr 27, 2016

But operating on pod uids is a bad user experience.

I'm ok with saying (possibly in the first version): if you have two pods with the same name historically the data will be mixed. Better approach is to return data only from the newest pod with the given name. For example I created on Monday pod with name my-name and then killed it on Tuesday. I created also a pod with the same name on Friday and it is still running. When querying for historical data for the whole week I'd like to get only metrics for the second pod, so there won't be any data for Mon-Thu. WDYT?

@DirectXMan12
Copy link
Contributor Author

Better approach is to return data only from the newest pod with the given name

This gets a bit complicated, since effectively we have to then know the appropriate pod uid for a pod name before making a query, so we have to make two queries against the database. That might not be so bad, but it does adds another round trip for every query.

If we're ok with the extra round trip, the optimal setup would probably be that you can refer to pod UID to get a specific pod, or refer to pod name to get the newest pod with that name.

@DirectXMan12
Copy link
Contributor Author

I've added in the pod-id-based endpoints, and added a note about the pod-name-based endpoints. I think there are some cases for certain backends in which we can get away with only one trip to the backend, but there are several cases where two will be required.

For instance, to get raw metrics from Hawkular, you need to know the pod id, AFAICT (@mwringe et all might be able to correct me on this). For aggregations, I'm not sure how Hawkular deals with a query which could return multiple time series, but I suspect we'll need it there as well.

This is a proposal for Oldtimer, the Heapster historical metrics
access component.  Oldtimer was original proposed in the vision
statement, but was not specified in any particular detail previously.
@DirectXMan12
Copy link
Contributor Author

I've also clarified a couple points for posterity's sake:

  • Oldtimer is intended to work only with data written in the Heapster storage schema (i.e. data originally written by Heapster)
  • Oldtimer is not mean to be a general-purpose query tool that maps arbitrary queries to the sink query language -- it's just supposed to support a basic set of aggregations on single time series.

cc @fabxc

I think at this point the proposal is pretty much ready for merge. @mwielgus anything else you want changed?

@mwielgus
Copy link
Contributor

LGTM

@mwielgus mwielgus added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2016
@mwielgus mwielgus merged commit e42a3c5 into kubernetes-retired:master Apr 29, 2016
@DirectXMan12 DirectXMan12 deleted the proposal/old-timer branch April 29, 2016 19:57
@mwringe
Copy link
Contributor

mwringe commented May 3, 2016

For instance, to get raw metrics from Hawkular, you need to know the pod id, AFAICT (@mwringe et all might be able to correct me on this). For aggregations, I'm not sure how Hawkular deals with a query which could return multiple time series, but I suspect we'll need it there as well.

Hmm, you shouldn't need to know the pod id to get the raw data, you should be able to do a query based on any of the labels. But there may have been an issue preventing this in the past.

Can you get in touch with Stefan Negrea (he is stefan_n in #hawkular on freenode) to answer your questions?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants