Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits#25644
Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits#25644colings86 merged 4 commits intoelastic:masterfrom colings86:fix/24986
Conversation
There was a problem hiding this comment.
Yep, only leaving it here until I ensure the tests pass so I have a reference of the old code
There was a problem hiding this comment.
isn't this also changing the order in which the hits are being serialized?
There was a problem hiding this comment.
Yep, I pushed a change just before you commented :)
jpountz
left a comment
There was a problem hiding this comment.
The PR looks good to me as-is but I left some suggestions of improvements.
There was a problem hiding this comment.
matter of taste but I tend to like method refs better, ie. Arrays.sort(hits, Comparators.comparing(SearchHit::docId))
There was a problem hiding this comment.
you could do if (subReaderContext == null || hit.docId() >= subReaderContext.docBase + subReaderContext.reader().maxDoc()) to avoid doing the ReaderUtil.subIndex binary search for every doc
There was a problem hiding this comment.
So the problem currently is that ScriptDocValues may reuse the values internally so reusing the same ScriptDocValues inside a segment is not allowed.
One solution is to change all ScriptDocValues to never reuse the values internally but I think we should do the other way around and make the DocValuesFieldsFetchSubPhase clone the values of the ScriptDocValues for each docID. I think it's better to do it this way since the fetch sub phase is not supposed to hit many documents whereas the aggregation that uses the ScriptDocValues will hit them all ?
I was thinking the same until I realized that both numbers and strings do not reuse, even though they are probably the most common types one would use in scripts. On the other hand, dates and geo points reuse objects even though they are probably less commonly used in scripts. Maybe we should just align them with strings and numbers? |
|
My latest commit changes dates and geo points to not reuse objects. |
There was a problem hiding this comment.
I'm wondering whether we should keep things this way here and do the cloning in get(int index)/getValue() to help GC by having even shorter lived objects, and potentially make escape analysis more likely to not ever create those objects.
Sure that's fine. Your last comment regarding GC is also a solution, we could not reuse objects and make sure that we don't create them when it's not needed (lazy creation on get).
I think the BinaryScriptDocValues reuses the BytesRef as well so it needs to cloning too ? |
|
We don't use the BinaryScriptDocValues directly to retrieve doc values so it should be fine. Though I am not sure that it won't be a problem later so I think it would be good to clearly mark the intention in the javadocs. I think it's dangerous to rely on the fact that some ScriptDocValues can reuse and some can't. |
+1 |
|
Thanks @jimczi, there are still some failing rest tests that I am working through so might ping for review again if fixing those gets complex enough to warrant a review |
* master: Fix inadvertent rename of systemd tests Adding basic search request documentation for high level client (elastic#25651) Disallow lang to be used with Stored Scripts (elastic#25610) Fix typo in ScriptDocValues deprecation warnings (elastic#25672) Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits (elastic#25644) Query range fields by doc values when they are expected to be more efficient than points. Remove SearchHit#internalHits (elastic#25653) [DOCS] Reorganized the highlighting topic so it's less confusing.
* master: (181 commits) Use a non default port range in MockTransportService Add a shard filter search phase to pre-filter shards based on query rewriting (elastic#25658) Prevent excessive disk consumption by log files Migrate RestHttpResponseHeadersIT to ESRestTestCase (elastic#25675) Use config directory to find jvm.options Fix inadvertent rename of systemd tests Adding basic search request documentation for high level client (elastic#25651) Disallow lang to be used with Stored Scripts (elastic#25610) Fix typo in ScriptDocValues deprecation warnings (elastic#25672) Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits (elastic#25644) Query range fields by doc values when they are expected to be more efficient than points. Remove SearchHit#internalHits (elastic#25653) [DOCS] Reorganized the highlighting topic so it's less confusing. Add an underscore to flood stage setting Avoid failing install if system-sysctl is masked Add another parent value option to join documentation (elastic#25609) Ensure we rewrite common queries to `match_none` if possible (elastic#25650) Remove reference to field-stats docs. Optimize the order of bytes in uuids for better compression. (elastic#24615) Fix BytesReferenceStreamInput#skip with offset (elastic#25634) ...
Closes #24986