meta-tags (previously known as extrinsic tags) #660

shalstea · 2017-06-21T17:16:01Z

Metrics 2.0 supports adding meta data to metrics, however this is at the cost of network bandwidth. A lot of meta data could be very static (e.g. the data-center a machine is in). It would be very nice to have a means for bulk-loading / updating static meta-data and having it merge in with tags.

For example every metric might have a tag host. Associated with the host is a collection of static data, cluster, data-center, os, os-version, etc. We would like to feed this in. From grafana this would appear as tags to the metric.

Dieterbe · 2017-06-22T07:39:35Z

these are all known issues.

all metadata currently transmitted (especially mdm format) is very redundant and too resource intensive (in network bandwith but also (de)serializing overhead in form of cpu time and memory allocations)
any metadata should thus be able to be sent/maintained asynchronously from the data stream (see metricdefinition refactor #199 for some concrete ideas on how to address)
this applies for both intrinsic and extrensic properties (tags that affect the metric Id and tags that don't)
as for exposing to grafana, I see 2 main ways : A) extending the query language to support tag based searching/filtering etc. B) a custom datasource for metrictank that is a superset of the graphite datasource, but with the extensions for displaying tags in the editor etc.

related : #352

shalstea · 2017-06-30T17:08:24Z

We would love to be able to add detailed descriptions of metrics, with some way to access this if clicking on some visual indicator on a panel that uses the metric. Users would then be able to really understand what the metric means.

Other static data would include things like:
units, graph as rate, etc.

Dieterbe · 2017-07-01T15:24:55Z

you're describing a feature request for grafana. maybe @daniellee or @torkelo can advise where to direct that topic.
I believe this at least partially overlaps with grafana/grafana#1153

TheStigB · 2018-06-26T20:04:02Z

Some additional details about the additional meta data / extrinsic tags.

The goal is to provide better filtering and group by capabilities in Grafana / MetricTank, by being able to augment the core tags with additional tags / meta data that should work like any first class tags in Grafana.

So we would like to be able to upload a set of tags that refer to a core tag on a daily or weekly basis.
Example:
Primary Tag, Additional Tags
HostId, Rollout Stage, OS, Location, .......
100, S1, RHEL7.1, DataCenter1

Initially it's ok if we only have one version of the extrinsic tags, having historical versions and handle changes in them overtime is a nice to have. But not a requirement at this point.

Dieterbe · 2018-06-26T21:50:49Z

The big question here I think is how do we want to do the associations?
Am I reading this right that you'd like to assign the additional tags by hooking them to pre-existing tags?
for example assigning OS and location tags to metrics, by specifying a hostId tag, and then all metrics with that hostId should get the tag?
implementation wise it could take the form as a set of dynamic high-level rules that we take into account when we query the index, or we could actually go and apply all these tags to every single metric.

looping in @DanCech we've previously discussed this. but i don't remember the outcome.

TheStigB · 2018-06-26T23:14:56Z

You got it. Basically a "table join" on a given tag the provided (per metric point) tags and the additional tags.

As a future enhancement it might be interesting to have two primary tags, so we can limit additional tags to a given namespace. (No need to do this for MVP)

As for "the upload" probably the easiest to via special message / topic over kafka. I don't think one upload needs to be atomic, but individual "key tag vale" and "additional tags" can be handled independently from the other tag values.

DanCech · 2018-07-12T21:34:27Z

After some lengthy discussions, we came up with a concept that may work for this functionality.

We would start by adding a separate index to hold meta-records that map tag queries (used to identify target series) with a set of "extrinsic" tag/value pairs to be added to those series.

We would then add a second reverse index to allow looking those records up by tag & value, in the same way as the existing reverse index is used to look up series.

The existing index would be augmented by adding a list of the meta-records associated with each series.

When altering the meta-records the system would look up the associated series (by executing the tag queries against the primary reverse index) and update their lists.

When adding a series, it would be compared against each entry in the meta-index to build the list.

We may also want to maintain in the meta-index a list of all series that are associated with each record, since we already have to do the work to produce the lists of meta-records associated with each series. This would be a cost in terms of index size, but would potentially be a big performance boost at query time (see below).

When executing a query, there are quite a few complex scenarios that would need to be dealt with, mostly around how to deal with query conditions. Basically we would need to pick a query condition that requires a non-empty value (as we do already), then do a lookup for that condition in both reverse indexes. Series matched from the primary index would be added to the prospective result set as normal, while any results from the lookup in the second reverse index would be used to look up series associated with the meta-records (either by executing the tag queries against the main reverse index or by getting a list of matching series directly if they were stored in the meta-record), and those would also be added to the prospective result set.

At this point each entry in the result set would need to be "enriched" with the tags from the associated meta-records by walking the list of meta-records in each series (we need to determine how conflicts would be handled when a meta-record contains extrinsic tag values that conflict with other meta-records associated with the series or with intrinsic tags), then we would be able to apply the rest of the query conditions to filter the enriched result down to the final set of series.

shanson7 · 2018-08-06T16:32:14Z

For a concrete use case, we will have tags like datacenter that will match millions of series and only have a couple values. In this case, we would need to have a list of thousands of host values that map to a particular dc.

As an alternative implementation, it could be possible to send the extrinsic tags along with the normal tags, but not use them to calculate the series id. In that case, metrictank would just need to update the index when a full MetricData message is received with different extrinsic tags. If they don't change then the MetricPoint optimization can still be used. This would mean that the application of extrinsic tags would be handled externally to MT.

shanson7 · 2018-08-16T15:21:09Z

Thinking about this a little this morning, I've got the following notes:

Memory usage - The memory idx is currently about 60% of our heap usage. In our use case, every extrinsic tag we add will match almost every series. That means giant sets of metric ids.
Cost of Update - We will likely do updates every few hours. Having to run every series through an expression match could be quite costly in this scenario, especially seeing as most of them are unlikely to change.

In our particular use case, what we actually need is a mapping from one tag key/value to another. As alluded to in the primary post, most of ours will be keyed off of the host tag. Almost every series has a host tag, and there are many orders of magnitudes more series than there are hosts. With this in mind I propose that we allow a simple mapping of intrinsic tag to extrinsic tags.

That way we could upload something like dc=dc1 maps to host=[host1,host3,host6,...].

At query time, we determine if the expressions references an extrinsic tag and do an efficient lookup (likely map) to determine if the current series mapped key matches the requested. e.g. if someone asks for something like name=abc AND dc=dc1 if we find abc;host=host1 and abc;host=host2 (using the name=abc filter) we can quickly lookup extrinsic_tags["dc"]["dc1"]["host1"] and extrinsic_tags["dc"]["dc1"]["host2"] to find that only host matches.

Memory Usage - With this approach, each extrinsic tag just needs the set of mapped intrinsic tags that match. This is a much smaller set in our case (about 20k vs 400M).
Cost of Update - If we require that the entire set be updated at once (i.e. all mappings defined for dc=dc1 must be supplied for each update, then it's pretty efficient as a map insert/update.

Dieterbe · 2018-10-24T08:25:16Z

from here on, this ticket is about the meta tags. the other tangential ideas (such as uploading generic info for display only but not for searching/filtering) can be done in a new ticket.

we're in the design phase for this feature. @replay can you share your work-in-progress design doc.

replay · 2019-01-22T14:07:38Z

This is a status update where I describe where we are at:
Last week I started working on the implementation as we've planned in the design doc.
So far I wrote the API calls to add and modify the rules which define the associations between meta-tags (extrinsic) and metric-tags (intrinsic), that's pushed in this branch: #960

Over the course of this and next week I'll implement the procedures to update the new index data structures when meta-records get added/modified/deleted and when metrics get added/deleted.
Querying the new index will be the last part, I suspect it will also be the most complicated part.

@shanson7 I would like to come up with an estimate how much additional memory those new data structures are going to consume, I will already take the current plans to optimize memory efficiency into account (@robert-milan is working on that). Could you maybe provide some numbers of your typical planned use cases of the meta tag index? Such as:

Number of series per MT
Average number of metric/intrinsic tags per series
Total number of meta tag rules defining associations between meta tags and metric/intrinsic tags
Average number of meta tags per series

shanson7 · 2019-01-24T20:41:07Z

Number of series per MT

We have about 4 million series in the index per MT instance

Average number of metric/intrinsic tags per series

I'm not sure how to calculate this, but every series has at least 5 tags, so average would be 6 or 7 tags.

Total number of meta tag rules defining associations between meta tags and metric/intrinsic tags

We would likely have at least one per host, so in the tens of thousands, each one effecting a small set of the series.

Average number of meta tags per series

Probably 5-10

replay · 2019-02-04T12:51:36Z

For reference i'm linking the current design doc from here: https://docs.google.com/document/d/1Kk3QYd3X1yIEUcRFigEjdx23dgZMEH2lM4pmka9oAcc

replay · 2019-05-08T13:43:57Z

Update:
We just merged the PR #1301
#1301 is merging a part of what's in the branch of #960, plus some improvements.
Next I'll rebase #960 onto the current master and then create more small PRs to merge the modifications of that branch piece by piece. That way they are easier to review, it's easier to keep them concise, and it's easier to assure that there are no unexpected regressions.

There will be at least 4 follow-up PRs, 2-4 are mostly just copying the modifications from the branch of #960 :

Move the input validation for tag queries into the API layer, currently that's in the index. We discussed that in Meta tags part1: meta record data structures and corresponding CRUD api calls #1301 (comment) and this should be relatively simple
Refactor the query expression type to make it more flexible, because in order to implement the querying/filtering by meta record functionality we need to be able to build sub-queries from meta records and for that it needs some more flexibility
Start using the meta records to build sub-queries, this will allow us to query by meta record
Implement the enrichment (at first without a cache)

After the above is done, I'll need to:

Add a way to persist the meta records into a permanent store
Implement the ability to swap out a whole set of meta records, instead of updating them one-by-one
Implement the enrichment cache

replay · 2019-08-28T16:07:10Z

Status Update:

This refers to the above comment (#660 (comment)):

The features listed in points 1 - 4 are done and working in my test environment. Also the "enrichment cache" which is mentioned at the bottom is done and merged. These changes have not been deployed in any production environment yet as far as I'm aware.

A PR for the ability to swap out a whole set of meta records is waiting for review: #1442

The persisting of meta records is not implement yet, I'm currently working on that. The plan is to add a new table in Cassandra/BigTable if the feature flag meta-tag-support is enabled, on startup the records would be read from there.

Ideas for improvements:

We want to improve how the meta records get propagated across a cluster. Currently this is done via http calls between cluster nodes, if one cluster node was not available then the client that submitted the original request will receive an error indicating that. This is not optimal, because with large clusters it can be normal that some number of MTs are down at any point in time, so we want to switch to a mechanism that allows us to come to a consensus among all nodes, without requiring them to all be available at the same time.
I believe there is room for improvement in how the evaluation order of a given set of query expressions gets determined. Currently, when MT determines the order in which to evaluate the given expressions, it only takes their operators and the cardinality of the involved parts of the metric index (intrinsic index) into account. This could be made smarter by also taking the cardinality of the meta tag index into account, if the meta tag feature is enabled.

agao48 · 2019-09-10T22:22:56Z

We did some preliminary research with Bloomberg's metrictank setup and enabling metatags comparing setups with varying amounts of metatags (no metatgas to 3 metatags). Details can be found here: https://gist.github.com/agao48/e3e2681d3652b8ca083b32b40733e550. More information like memory and cpu performance while ingesting and querying can also be provided.

replay · 2019-09-11T08:39:01Z

Thanks for the results @agao48.

Based on your profile, it looks to me like the enrichment phase is slower than expected. The enrichment works like this:

When the first query gets received after some meta tag records have been modified, a new enricher gets instantiated. That enricher has a set of filter functions for all the defined meta tags.
After the lookup of series is done, it passes each of them to the enricher to do a reverse lookup over the meta tag index which results in the set of meta tags that need to be associated with each of the series in the result set.
The correct meta tags then get associated with each series in the result set, and this result also gets cached for the next time when this metric needs to be enriched

I have a few questions regarding your benchmarks:

In your benchmarks with no meta tags, was the meta tag support feature flag turned on or off? This makes a difference, because if the support is turned on, even if no meta tags are defined certain parts of the lookup will be a bit slower.
In your test query seriesByTag('namespace=os','name=cpu.percent.idle.g') is the namespace tag a meta tag? When you did your benchmarks without meta tags, did you run the same query while the namespace tag wasn't present? Or did you run a different query during the benchmarks without meta tags?
There is a config setting called enrichment-cache-size, was that set to the default of 10000? If so, then I'm surprised that so much CPU time was spent on the enrichment, because if your result set consisted of only 21 time series then their enrichment results should have been cached on the first query and after that they should always have been reused. Then I'd need to check what I can do to improve the enrichment cache speed.

replay · 2019-09-11T14:57:15Z

For completeness I'm copy pasting the reply that I got from agao48

The original benchmark was with metatags enabled, but no metatags added. I updated the gist with the metatags being completely disabled. As you stated already, lookup was faster when the metatags feature was completely disabled.

That query contains no metatags at all so all the benchmarks use a query with no metatags I did run a test locally where we specified one metatag and lookup seemed to be faster. If you would like, I can rerun those tests to get you the stats there.

enrichment-cache-size was set to the default of 10000. Before each test, I queried the data once also thinking the data should get cached. I can experiment with an extremely low value to test the impact of cache size if you would like

replay · 2019-09-11T17:00:06Z

@agao48 I think we have found a bug in how the enricher gets instantiated and fixed it with this PR:
#1455

If you get a chance, could you please retry the same benchmark with the latest master that includes this PR? Thanks.

agao48 · 2019-09-11T21:24:00Z

@replay Rebuilt and tested with the master that has that fix. Results in table format below, but looks a lot better.

Query:

GET http://localhost:6060/render?target=sumSeries(seriesByTag('namespace=os','name=cpu.percent.idle.g'))&from=1567922400&until=1567954800&format=json

Test: $vegeta attack -duration 120s -rate 10 -timeout 0

Before: results from initial benchmarking

After: results after fixing bug in get enricher

	1 tag before	1 tag after	2 tags before	2 tags after	3 tags before	3 tags after
Latencies
mean	205.651992ms	12.962963ms	213.400526ms	13.47646ms	220.924238ms	13.324256ms
50	189.744883ms	12.187033ms	188.395671ms	12.517429ms	197.006305ms	12.30844ms
90	291.740413m	18.750181ms	358.45829ms	20.224633ms	344.310485ms	20.233422ms
99	400.456912ms	27.464192ms	505.764124ms	31.077227ms	591.181863ms	34.614008ms
max	786.23406ms	46.510145ms	834.770011ms	49.137374ms	894.33863ms	99.592477ms

replay · 2019-12-17T17:22:51Z

Can we close this issue? As far as I'm aware the feature is "done" as in "it works". If there are any further issues coming up, these would then be a new issue. Or would you prefer to wait with that until you deployed it @agao48?

shanson7 · 2019-12-17T17:41:37Z

I'm ok with closing this. I think we can open more specific issues if/when we find the need.

replay mentioned this issue Jul 11, 2018

[wip] Meta Tag Index #960

Closed

Dieterbe added the customer-impacting label Aug 22, 2018

Dieterbe added this to the 1.0 milestone Aug 22, 2018

Dieterbe changed the title ~~Static Meta-Data uploading and merging into tags~~ meta-tags (previously known as extrinsic tags) Oct 24, 2018

Dieterbe assigned replay Apr 11, 2019

Dieterbe mentioned this issue May 8, 2019

Meta tags part1: meta record data structures and corresponding CRUD api calls #1301

Merged

robert-milan mentioned this issue May 21, 2019

Roadmap #1319

Open

27 tasks

This was referenced May 22, 2019

Move tag query input validation to api layer #1322

Closed

Tag query input validation and parsing in api #1329

Merged

replay mentioned this issue Jun 29, 2019

Move tag query evaluation logic into tag expressions #1373

Merged

This was referenced Aug 9, 2019

Implement series lookup and filtering by meta tag #1423

Merged

Meta tag enrichment #1433

Merged

Dieterbe modified the milestones: vnext, sprint-2 Oct 7, 2019

fkaleo modified the milestones: sprint-2, sprint-3 Oct 28, 2019

replay modified the milestones: sprint-3, sprint-4 Nov 18, 2019

robert-milan modified the milestones: sprint-4, sprint-5 Dec 9, 2019

Dieterbe closed this as completed Dec 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta-tags (previously known as extrinsic tags) #660

meta-tags (previously known as extrinsic tags) #660

shalstea commented Jun 21, 2017

Dieterbe commented Jun 22, 2017 •

edited by replay

Loading

shalstea commented Jun 30, 2017 •

edited

Loading

Dieterbe commented Jul 1, 2017 •

edited

Loading

TheStigB commented Jun 26, 2018

Dieterbe commented Jun 26, 2018

TheStigB commented Jun 26, 2018

DanCech commented Jul 12, 2018

shanson7 commented Aug 6, 2018

shanson7 commented Aug 16, 2018

Dieterbe commented Oct 24, 2018 •

edited

Loading

replay commented Jan 22, 2019 •

edited

Loading

shanson7 commented Jan 24, 2019

replay commented Feb 4, 2019

replay commented May 8, 2019 •

edited

Loading

replay commented Aug 28, 2019 •

edited

Loading

agao48 commented Sep 10, 2019

replay commented Sep 11, 2019 •

edited

Loading

replay commented Sep 11, 2019

replay commented Sep 11, 2019

agao48 commented Sep 11, 2019 •

edited

Loading

replay commented Dec 17, 2019

shanson7 commented Dec 17, 2019

meta-tags (previously known as extrinsic tags) #660

meta-tags (previously known as extrinsic tags) #660

Comments

shalstea commented Jun 21, 2017

Dieterbe commented Jun 22, 2017 • edited by replay Loading

shalstea commented Jun 30, 2017 • edited Loading

Dieterbe commented Jul 1, 2017 • edited Loading

TheStigB commented Jun 26, 2018

Dieterbe commented Jun 26, 2018

TheStigB commented Jun 26, 2018

DanCech commented Jul 12, 2018

shanson7 commented Aug 6, 2018

shanson7 commented Aug 16, 2018

Dieterbe commented Oct 24, 2018 • edited Loading

replay commented Jan 22, 2019 • edited Loading

shanson7 commented Jan 24, 2019

replay commented Feb 4, 2019

replay commented May 8, 2019 • edited Loading

replay commented Aug 28, 2019 • edited Loading

agao48 commented Sep 10, 2019

replay commented Sep 11, 2019 • edited Loading

replay commented Sep 11, 2019

replay commented Sep 11, 2019

agao48 commented Sep 11, 2019 • edited Loading

replay commented Dec 17, 2019

shanson7 commented Dec 17, 2019

Dieterbe commented Jun 22, 2017 •

edited by replay

Loading

shalstea commented Jun 30, 2017 •

edited

Loading

Dieterbe commented Jul 1, 2017 •

edited

Loading

Dieterbe commented Oct 24, 2018 •

edited

Loading

replay commented Jan 22, 2019 •

edited

Loading

replay commented May 8, 2019 •

edited

Loading

replay commented Aug 28, 2019 •

edited

Loading

replay commented Sep 11, 2019 •

edited

Loading

agao48 commented Sep 11, 2019 •

edited

Loading