-
Notifications
You must be signed in to change notification settings - Fork 107
Add optimizations for meta records of which the underlying expressions use equal operator #1542
Conversation
is there any types of queries that might get a slow down because of this - or the other linked to - PR? |
In the case of this PR #1542 I'm quite confident that it won't slow down any other queries. It detects a specific situation, which is that all expressions underlying a meta tag which is used to filter down the result set are using the In the case of #1541 it is possible that certain queries get slowed down if for example a meta tag is very cheap to evaluate because it only uses relatively simple operators (such as |
832c6bd
to
677eb7d
Compare
677eb7d
to
3118ad3
Compare
rebased this onto the latest master |
@@ -925,6 +925,8 @@ func (m *UnpartitionedMemoryIdx) add(archive *idx.Archive) { | |||
path := def.NameWithTags() | |||
|
|||
if TagSupport { | |||
sort.Strings(def.Tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case where MetricData.SetId() has not been called prior to this line being executed? MetricData.SetId() sorts the tags. I thought we had added this before and then removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two callpaths leading to .add()
.
- When loading the index from the backend store, we directly add it to the index without calling
SetId()
. Of course we can assume that these entries have been written by MT, but we don't really know what users may do with their stores, so I'd prefer to not rely on that. Especially because due to the next point we may ingest unsorted tag slices and store them in the store unsorted. - When ingesting data:
- When ingesting from kafka, with or without write queue,
.SetId()
does not get called. We could assume that the producer always sorts the strings, but I don't think we should rely on that. - When ingesting from carbon we do call
.SetId()
, so that's fine
- When ingesting from kafka, with or without write queue,
I think it's also worth considering that if a slice is already sorted, then calling sort.Strings()
on it should be very cheap. So I think just to be not have to rely on assumptions, this is still worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that this is not a blocker, I'll already go ahead and merge. We can still discuss removing it again after merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the one comment, looks good.
This PR is based on #1541, it does not make sense to review it before that one is merged.
When we filter by meta records of which the underlying expressions all use the equal operator we can save a lot of tag index lookup by building a set of acceptable tags based on the meta record expressions and then simple filtering by them, instead of having to evaluate each of them by doing the tag lookup on the tag index.
These 3 benchmarks are testing cases where we filter by meta tags which have a large number of underlying expressions which all use equal operators. For example a datacenter mapping to a list of
host=X
tag/value pairs.grafana/metrictank-ops/issues/524