Add optimizations for meta records of which the underlying expressions use equal operator #1542

replay · 2019-11-20T00:55:26Z

This PR is based on #1541, it does not make sense to review it before that one is merged.

When we filter by meta records of which the underlying expressions all use the equal operator we can save a lot of tag index lookup by building a set of acceptable tags based on the meta record expressions and then simple filtering by them, instead of having to evaluate each of them by doing the tag lookup on the tag index.

These 3 benchmarks are testing cases where we filter by meta tags which have a large number of underlying expressions which all use equal operators. For example a datacenter mapping to a list of host=X tag/value pairs.

benchmark                                                                                  old ns/op        new ns/op       delta
BenchmarkFilter100kByMetaTagWithIndexSize1mAnd50kMetaRecords-12                            95441374800      285884142       -99.70%
BenchmarkFilter10kByMetaTagWithIndexSize1mAnd10kMetaRecords-12                             2008162900       30703885        -98.47%
BenchmarkFilter100kByMetaTagWithIndexSize1mAnd50kMetaRecordsWithMultipleExpressions-12     115969750200     12734048100     -89.02%

grafana/metrictank-ops/issues/524

Dieterbe · 2019-11-20T09:38:51Z

is there any types of queries that might get a slow down because of this - or the other linked to - PR?

replay · 2019-11-20T12:30:08Z

In the case of this PR #1542 I'm quite confident that it won't slow down any other queries. It detects a specific situation, which is that all expressions underlying a meta tag which is used to filter down the result set are using the = operator, and then it applies an optimization which is very specific to this situation. Since this is a very common use case of which we know that big customers are using it often it makes sense to add an optimization which is very specific to that use case.

In the case of #1541 it is possible that certain queries get slowed down if for example a meta tag is very cheap to evaluate because it only uses relatively simple operators (such as =) and only matches a small number of metrics, while at the same time another expression of the same query is operating on the metric tag index while using expensive operators (such as =~) and matching a large number of series.
Based on what we know the common use case of the meta tag feature is, this is an unlikely use case though, so we optimize for the likely scenario.

… equal operators

replay · 2019-12-04T15:28:32Z

rebased this onto the latest master

robert-milan · 2019-12-06T11:32:28Z

idx/memory/memory.go

@@ -925,6 +925,8 @@ func (m *UnpartitionedMemoryIdx) add(archive *idx.Archive) {
 	path := def.NameWithTags()

 	if TagSupport {
+		sort.Strings(def.Tags)


Is there a case where MetricData.SetId() has not been called prior to this line being executed? MetricData.SetId() sorts the tags. I thought we had added this before and then removed it.

There are two callpaths leading to .add().

When loading the index from the backend store, we directly add it to the index without calling SetId(). Of course we can assume that these entries have been written by MT, but we don't really know what users may do with their stores, so I'd prefer to not rely on that. Especially because due to the next point we may ingest unsorted tag slices and store them in the store unsorted.

When ingesting data:

When ingesting from kafka, with or without write queue, .SetId() does not get called. We could assume that the producer always sorts the strings, but I don't think we should rely on that.

When ingesting from carbon we do call .SetId(), so that's fine

I think it's also worth considering that if a slice is already sorted, then calling sort.Strings() on it should be very cheap. So I think just to be not have to rely on assumptions, this is still worth it.

Assuming that this is not a blocker, I'll already go ahead and merge. We can still discuss removing it again after merging.

robert-milan

Aside from the one comment, looks good.

replay force-pushed the 20191119_issue_ops-524 branch from 832c6bd to 677eb7d Compare November 20, 2019 16:10

replay added 4 commits December 4, 2019 12:27

refactor benchmarks

6aeab60

add benchmark to filter by meta tags

e598138

optimize case where all meta records matching expression are based on…

c4724f7

… equal operators

comments and better names

3118ad3

replay force-pushed the 20191119_issue_ops-524 branch from 677eb7d to 3118ad3 Compare December 4, 2019 15:28

robert-milan reviewed Dec 6, 2019

View reviewed changes

robert-milan approved these changes Dec 6, 2019

View reviewed changes

replay merged commit e607c4b into master Dec 6, 2019

replay deleted the 20191119_issue_ops-524 branch December 6, 2019 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimizations for meta records of which the underlying expressions use equal operator #1542

Add optimizations for meta records of which the underlying expressions use equal operator #1542

replay commented Nov 20, 2019 •

edited

Loading

Dieterbe commented Nov 20, 2019

replay commented Nov 20, 2019 •

edited

Loading

replay commented Dec 4, 2019

robert-milan Dec 6, 2019

replay Dec 6, 2019 •

edited

Loading

replay Dec 6, 2019

robert-milan left a comment

Add optimizations for meta records of which the underlying expressions use equal operator #1542

Add optimizations for meta records of which the underlying expressions use equal operator #1542

Conversation

replay commented Nov 20, 2019 • edited Loading

Dieterbe commented Nov 20, 2019

replay commented Nov 20, 2019 • edited Loading

replay commented Dec 4, 2019

robert-milan Dec 6, 2019

Choose a reason for hiding this comment

replay Dec 6, 2019 • edited Loading

Choose a reason for hiding this comment

replay Dec 6, 2019

Choose a reason for hiding this comment

robert-milan left a comment

Choose a reason for hiding this comment

replay commented Nov 20, 2019 •

edited

Loading

replay commented Nov 20, 2019 •

edited

Loading

replay Dec 6, 2019 •

edited

Loading