Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Rework graphite docs #1796

Merged
merged 3 commits into from
Apr 23, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 23 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

## Introduction

Metrictank is a multi-tenant timeseries engine for Graphite and friends.
Metrictank is a multi-tenant timeseries platform that can be used as a backend or replacement for Graphite.
It provides long term storage, high availability, efficient storage, retrieval and processing for large scale environments.

[GrafanaLabs](http://grafana.com) has been running metrictank in production since December 2015.
Expand All @@ -17,37 +17,45 @@ that makes this process much easier.
## Features

* 100% open source
* Inspired by the [Facebook gorilla paper](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf).
Most notably, the heavily compressed chunks dramatically lower cpu, memory and storage requirements.
* Writeback RAM cache, serving most data out of memory.
* Graphite is a first class citizen. As of graphite-1.0.1, metrictank can be used as a graphite CLUSTER_SERVER.
* Can also act as a Graphite server itself, though the functions processing library is only partially implemented, metrictank proxies requests to Graphite if it can't handle the required processing (for those requests it will degrade to just being the backend storage)
* Accurate, flexible rollups by storing min/max/sum/count (which also gives us average).
* Heavily compressed chunks (inspired by the [Facebook gorilla paper](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf)) dramatically lower cpu, memory, and storage requirements and get much greater performance out of Cassandra than other solutions.
* Writeback RAM buffers and chunk caches, serving most data out of memory.
* Multiple rollup functions can be configured per serie (or group of series). E.g. min/max/sum/count/average, which can be selected at query time via consolidateBy().
So we can do consolidation (combined runtime+archived) accurately and correctly,
[unlike most other graphite backends like whisper](https://grafana.com/blog/2016/03/03/25-graphite-grafana-and-statsd-gotchas/#runtime.consolidation)
* Flexible tenancy: can be used as single tenant or multi tenant. Selected data can be shared across all tenants.
* Input options: carbon, metrics2.0, kafka (soon: json or msgpack over http)
* Guards against excessive data requests
* Input options: carbon, metrics2.0, kafka.
* Guards against excessively large queries. (per-request series/points restrictions)
* Data backfill/import from whisper
* Speculative Execution means you can use replicas not only for High Availability but also to reduce query latency.
* Write-Ahead buffer based on Kafka facilitates robust clustering and enables other analytics use cases.
* Tags and Meta Tags support
* Render response metadata: performance statistics, series lineage information and rollup indicator visible through Grafana
* Index pruning (hide inactive/stale series)
* Timeseries can change resolution (interval) over time, they will be merged seamlessly at read time. No need for any data migrations.

## Relation to Graphite

The goal of Metrictank is to provide a more scalable, secure, resource efficient and performant version of Graphite that is backwards compatible, while also adding some novel functionality.
(see Features, above)

There's 2 main ways to deploy Metrictank:
* as a backend for Graphite-web, by setting the `CLUSTER_SERVER` configuration value.
* as an alternative to a Graphite stack. This enables most of the additional functionality. Note that Metrictank's API is not quite on par yet with Graphite-web: some less commonly used functions are not implemented natively yet, in which case Metrictank relies on a graphite-web process to handle those requests. See [our graphite comparison page](docs/graphite.md) for more details.

## Limitations

* No performance/availability isolation between tenants per instance. (only data isolation)
* Minimum computation locality: we move the data from storage to processing code, which is both metrictank and graphite.
* Backlog replaying and queries can be made faster. [A Go GC issue may occasionally inflate response times](https://github.com/golang/go/issues/14812).
* We use metrics2.0 in native input protocol and indexes, but [barely do anything with it yet](https://github.com/grafana/metrictank/blob/master/docs/tags.md).
* can't overwrite old data. We support reordering the most recent time window but that's it. (unless you restart MT)
* Can't overwrite old data. We support reordering the most recent time window but that's it. (unless you restart MT)

## Interesting design characteristics (feature or limitation... up to you)

* Upgrades / process restarts requires running multiple instances (potentially only for the duration of the maintenance) and possibly re-assigning the primary role.
Otherwise data loss of current chunks will be incurred. See [operations guide](https://github.com/grafana/metrictank/blob/master/docs/operations.md)
* clustering works best with an orchestrator like kubernetes. MT itself does not automate master promotions. See [clustering](https://github.com/grafana/metrictank/blob/master/docs/clustering.md) for more.
* Only float64 values. Ints and bools currently stored as floats (works quite well due to the gorilla compression),
No text support.
* Only uint32 unix timestamps in second resolution. For higher resolution, consider [streaming directly to grafana](https://grafana.com/blog/2016/03/31/using-grafana-with-intels-snap-for-ad-hoc-metric-exploration/)
* No data locality: doesn't seem needed yet to put related series together.

* We distribute data by hashing keys, like many similar systems. This means no data locality (data that will be often used together may not live together)

## Docs

Expand Down
28 changes: 15 additions & 13 deletions docs/graphite.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
# Graphite

Metrictank aims to be a drop-in replacement for Graphite, but also to address a few of Graphite's shortcomings.
Here are some important functional differences to keep in mind:
(we specifically do not go into subjective things like performance or scalability here)
For a general overview of how Metrictank relates and compares to Graphite, please see the [Readme](../README.md)

* currently no support for rewriting old data; for a given key and timestamp first write wins, not last. We aim to fix this.
* timeseries can change resolution (interval) over time, they will be merged seamlessly at read time.
* multiple rollup functions are supported and can be selected via consolidateBy() at query time. (except when using functions which change the nature of the data such as perSecond() etc)
* xFilesfactor is currently not supported
* will never move observations into the past (e.g. consolidation and rollups will only cause data to get an equal or higher timestamp)
* graphite timezone defaults to Chicago, we default to server time
* many functions are not implemented yet in metrictank itself, but it autodetects this and will proxy requests it cannot handle to graphite-web
(which then uses metrictank as a simple backend). See below for details
## Caveats

There are some small behavioral and functional differences with Graphite:

* Currently no support for rewriting old data; There is a reorder-buffer to support out-of-order writes to an extent. Full archived data rewriting is on the roadmap.
* Will never move observations into the past (e.g. consolidation and rollups will only cause data to get an equal or higher timestamp)
* Graphite timezone defaults to Chicago, we default to server time
* xFilesfactor is currently not supported for rollups. It is fairly easy to address, but we haven't had a need for it yet.
* Graphite supports the following render formats: csv, json, dygraph, msgpack, pickle, png, pdf, raw, rickshaw, and svg.
Metrictank only implements json, msgp, msgpack, and pickle. Grafana only uses json. In particular, Metrictank does not render images, because Grafana renders great.
* Some less commonly used functions are not implemented yet in Metrictank itself, but Metrictank can seamlessly proxy those to graphite-web (see below for details)
At Grafana Labs, 90 to 95 % of requests get handled by Metrictank without involving Graphite.


## Processing functions
Expand All @@ -27,7 +28,8 @@ There are 3 levels of support:
* Stable : 100% compatible with graphite and vetted
* Unstable: not fully compatible yet or not vetted enough

When you request functions that metrictank cannot provide, it will automatically proxy requests to graphite for a seamless failover.
When you request functions that Metrictank cannot provide, it will automatically, seamlessly proxy requests to graphite.
Those requests will not include response metadata, will still use Metrictank as a storage system if Graphite is configured that way, and may return a bit slower.
You can also choose to enable unstable functions via process=any
See also:
* [HTTP api docs for render endpoint](https://github.com/grafana/metrictank/blob/master/docs/http-api.md#graphite-query-api)
Expand Down Expand Up @@ -59,7 +61,7 @@ See also:
| changed | | No |
| color | | No |
| consolidateBy(seriesList, func) seriesList | | Stable |
| constantLine | | Stable |
| constantLine | | No |
| countSeries(seriesLists) series | | Stable |
| cumulative | | Stable |
| currentAbove | | Stable |
Expand Down