Skip to content

Regex optimisation on index does redundant lookups #2906

@gouthamve

Description

@gouthamve

When caching index entries, we basically lookup the entire row (to cache it) and then filter the entries that match after loading the full row:

type filteringBatchIter struct {
query chunk.IndexQuery
chunk.ReadBatchIterator
}
func (f *filteringBatchIter) Next() bool {
for f.ReadBatchIterator.Next() {
rangeValue, value := f.ReadBatchIterator.RangeValue(), f.ReadBatchIterator.Value()
if len(f.query.RangeValuePrefix) != 0 && !bytes.HasPrefix(rangeValue, f.query.RangeValuePrefix) {
continue
}
if len(f.query.RangeValueStart) != 0 && bytes.Compare(f.query.RangeValueStart, rangeValue) > 0 {
continue
}
if len(f.query.ValueEqual) != 0 && !bytes.Equal(value, f.query.ValueEqual) {
continue
}
return true
}
return false
}
// QueryFilter wraps a callback to ensure the results are filtered correctly;
// useful for the cache and Bigtable backend, which only ever fetches the whole
// row.
func QueryFilter(callback Callback) Callback {
return func(query chunk.IndexQuery, batch chunk.ReadBatch) bool {
return callback(query, &filteringBatch{query, batch})
}
}
and
// We cache the entire row, so filter client side.
callback = chunk_util.QueryFilter(callback)

Now we introduced an optimisation that does the following:

set := FindSetMatches(matcher.Value)
for _, v := range set {
var qs []IndexQuery
qs, err = c.schema.GetReadQueriesForMetricLabelValue(from, through, userID, metricName, matcher.Name, v)
if err != nil {
break
}
queries = append(queries, qs...)

Basically if we have a lookup that looks like label=~"a|b|c|d" we are now splitting it into 4 different lookups of label=a, label=b, etc. This is causing us to load the entire row multiple times due to caching. We should make sure to push this optmisation down to filteringBatchIter and not load something multiple times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions