Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

add findCache for memoryIdx #1233

Merged
merged 22 commits into from
Mar 12, 2019
Merged

add findCache for memoryIdx #1233

merged 22 commits into from
Mar 12, 2019

Conversation

woodsaj
Copy link
Member

@woodsaj woodsaj commented Mar 8, 2019

Cache patterns and their results in a LRU.
When new defs are added to the index, only cached patterns that match
the new series name are invalidated. The cache is purged after every
prune task runs and after any call to delete items from the index.

@woodsaj woodsaj requested review from replay and robert-milan March 8, 2019 19:26
Cache patterns and their results in a LRU.
When new defs are added to the index, only cached patterns that match
the new series name are invalidated.  The cache is purged after every
prune task runs and after any call to delete items from the index.
@woodsaj woodsaj force-pushed the findCache branch 3 times, most recently from 8e3f270 to 2c70e84 Compare March 11, 2019 06:13
woodsaj added 2 commits March 11, 2019 14:43
When new series are recieved we need to invlidate any findCache
patterns that match the series name. If there are lots of new series
being created, this can have a negative performance impact.  This change
causes the cache to be disabled for 1minute when the rate of new series
is higher than what we can process.
- add config settings for:
 - find-cache-invalidate-queue:  number of new series names to process
    Each new series name is compared against expresions in the findCache
    if the expression matches the new name, it is removed from the cache.
 - find-cache-backoff: amount of time to disable the findCache for when
   the invalidate-queue fills up.  If there are lots of new series being
   added then scanning the cache for patterns that match can become
   exspensive, so it is just best to disable the cache for a while.
- log a message when the cache is disabled due to the invalidateQueue
  limit being reached.
@replay
Copy link
Contributor

replay commented Mar 11, 2019

Over the past few weeks we've been trying to make MT use less memory, while sacrificing a bit more CPU for that (f.e. object interning). This change is going into the opposite direction again (more mem / less cpu), so maybe we should disable the find cache by default and only enable it when we know we need it?

findCacheMiss.Inc()
return nil, false
}
nodes, ok := cache.Get(pattern)
Copy link
Contributor

@robert-milan robert-milan Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance that cache could have already been deleted before attempting this call in some fringe situations?

Edit:
I think all of the calls are wrapped inside RLocks or Locks in the index.

Copy link
Member Author

@woodsaj woodsaj Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that one thread calls cache, ok := c.cache[orgId] then releases the lock, then before nodes, ok := cache.Get(pattern) is executed another thread gets a write lock and calls delete(c.cache, orgId). But it doesnt really matter. This is still safe as the items in c.cache[orgId] are pointers. The end result is that calls to findCache.Get() will return results based on the content of cache when the Rlock() was acquired an not on the contents of the cache at the specific time that nodes, ok := cache.Get(pattern) is executed.

@woodsaj
Copy link
Member Author

woodsaj commented Mar 11, 2019

Over the past few weeks we've been trying to make MT use less memory, while sacrificing a bit more CPU for that (f.e. object interning). This change is going into the opposite direction again (more mem / less cpu),

It is more memory, but not much, as the cache is just a map of strings, that point to slices of pointers. Also, though heap memory could be potentially higher, instances with moderate query loads will see a reduction in allocations (which are needed to search the tree) which will help keep RSS lower.

Copy link
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the benchmarks looking?

Copy link
Contributor

@replay replay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dieterbe
Copy link
Contributor

+1 to what Robert said.
any PR with an impact on performance should show relevant benchmark results.

@woodsaj
Copy link
Member Author

woodsaj commented Mar 12, 2019

Comparison between master and this branch for relevant Benchmarks

benchmark                                         old ns/op     new ns/op     delta
BenchmarkConcurrent4Find/partitioned-8            150287        40939         -72.76%
BenchmarkConcurrent4Find/unPartitioned-8          43931         22132         -49.62%
BenchmarkConcurrent8Find/partitioned-8            201209        37329         -81.45%
BenchmarkConcurrent8Find/unPartitioned-8          59867         19462         -67.49%
BenchmarkConcurrentInsertFind/partitioned-8       412081        42666         -89.65%
BenchmarkConcurrentInsertFind/unPartitioned-8     364459        20785         -94.30%

benchmark                                         old allocs     new allocs     delta
BenchmarkConcurrent4Find/partitioned-8            2324           257            -88.94%
BenchmarkConcurrent4Find/unPartitioned-8          637            187            -70.64%
BenchmarkConcurrent8Find/partitioned-8            2324           257            -88.94%
BenchmarkConcurrent8Find/unPartitioned-8          637            187            -70.64%
BenchmarkConcurrentInsertFind/partitioned-8       11788          258            -97.81%
BenchmarkConcurrentInsertFind/unPartitioned-8     10115          187            -98.15%

benchmark                                         old bytes     new bytes     delta
BenchmarkConcurrent4Find/partitioned-8            95997         27794         -71.05%
BenchmarkConcurrent4Find/unPartitioned-8          34411         18879         -45.14%
BenchmarkConcurrent8Find/partitioned-8            95987         27792         -71.05%
BenchmarkConcurrent8Find/unPartitioned-8          34411         18881         -45.13%
BenchmarkConcurrentInsertFind/partitioned-8       524699        27841         -94.69%
BenchmarkConcurrentInsertFind/unPartitioned-8     484934        18897         -96.10%

@Dieterbe
Copy link
Contributor

this needs more comments about the design and implementation.
the backoff/InvalidateQueue stuff in particular is hard to follow.

@Dieterbe
Copy link
Contributor

Can we add a fast-path for the find-cache disabled case? so it immediately returns out of Add/Purge/.. calls when disabled?

@woodsaj
Copy link
Member Author

woodsaj commented Mar 12, 2019

Can we add a fast-path for the find-cache disabled case? so it immediately returns out of Add/Purge/.. calls when disabled?

we already do that. Caches are per orgId, of c.cache[orgId] doesnt exist we immediately return.

@woodsaj woodsaj merged commit 707e6ce into master Mar 12, 2019
@woodsaj woodsaj deleted the findCache branch March 12, 2019 19:48
@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 13, 2019

Can we add a fast-path for the find-cache disabled case? so it immediately returns out of Add/Purge/.. calls when disabled?

we already do that. Caches are per orgId, of c.cache[orgId] doesnt exist we immediately return.

this doesn't seem to be true. Add() will simply add the new cache if it doesn't exist yet, so any subsequent calls will see a cache and operate on it.

@woodsaj
Copy link
Member Author

woodsaj commented Mar 13, 2019

this doesn't seem to be true. Add() will simply add the new cache if it doesn't exist yet, so any subsequent calls will see a cache and operate on it.

https://github.com/grafana/metrictank/blob/master/idx/memory/find_cache.go#L76-L79

@Dieterbe
Copy link
Contributor

Not sure what you're trying to say here. t would be time.Time{} there (== unix epoch) so that branch wouldn't be taken. what am i missing?

@woodsaj
Copy link
Member Author

woodsaj commented Mar 13, 2019

t would be time.Time{}

If the cache is disabled, t will be a time in the future. https://github.com/grafana/metrictank/blob/master/idx/memory/find_cache.go#L138

@Dieterbe
Copy link
Contributor

future proofing the above URL's:

// dont init the cache if we are in backoff mode.
if time.Until(t) > 0 {
return
}

c.backoff[orgId] = time.Now().Add(c.backoffTime)

@Dieterbe
Copy link
Contributor

we cleared it up in a call: the cache cannot be disabled.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants