-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear thread local values on UTF8TaxonomyWriterCache.close() #12013
Conversation
Thanks, it looks better to use As a followup can we open an issue to move away from Threadlocal storage here by default? I don't understand why we have this "cache" with this description:
Also, I'm trying to remove ThreadLocal (and CloseableThreadLocal) usages elsewhere in the codebase (e.g. #11998). These threadlocals are problematic in java apps that have many threads (or high thread churn rate), just as CloseableThreadLocal documents and attempts to workaround. At a glance, seems to me that using a bounded LRU cache would be much more reasonable... |
Also (summoning my inner @uschindler), curious what the perf impact is if we simply remove these caches completely. Maybe they are not relevant to the performance anymore due to faceting changes (e.g. storing labels in docvalues rather than payloads) ? Maybe they are even hurting performance? |
Please remove the cache. Totally useless. It hurts more because the escape analysis can't work. My rule: Never ever cache object instances to prevent object creation and help GC. This was needed in in Java 6 and java 7, but not anymore. It has opposite problem: it actually creates those objects on heap and stresses GC. |
Yeah, I'm highly suspicious of this cache. I dug into this and found that previously stored fields (!) were used for this lookup. So no surprise there was a cache around it, since this is a slow approach!!! But @gautamworah96 fixed this, and these days BinaryDocValues are used instead: see #10490 IMO there is no sense in caching BinaryDocValues lookups into the heap, it is already mmap'd and the operating system will do this. |
My understanding is that It seems that we need such a cache to get ordinals for newly added categories from documents that are still pending flush. * <p>The reason why the caller needs to know if part of the cache was cleared is that in that
* case it will have to commit its on-disk index (so that all the latest category additions can be
* searched on disk, if we can't rely on the cache to contain them). However, with faster BinaryDocValue fields, maybe we don't need the "never evicting UTF8TaxonomyWriterCache" anymore. And we could make LruTaxonomyWriterCache as the default? I can close this PR and make that change if it makes sense. Or we can merge this change and take it up in a separate issue. Are there faceting specific benchmarks that can help validate the change? |
OK, crazy that having no cache at all could cause unnecessary amounts of flush/reopens. We want at least a small one.
+1 to make the LRU cache the default. And please, lets also remove this ThreadLocal-based cache completely in main branch, and deprecate in 9.5 |
Created a separate PR - #12092 to remove support for |
PR - #12093 to deprecate |
UTF8TaxonomyWriterCache
keeps a ThreadLocalBytesRefBuilder
, the bytes for which, are not removed on thread close. This leads to memory leaks.The change uses
CloseableThreadLocal
instead, and closes out the object on cache.close()Addresses #12000