Fix ArrayIndexOutOfBoundsException in fetch phase with partial results#144385
Fix ArrayIndexOutOfBoundsException in fetch phase with partial results#144385reugn merged 12 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
|
Hi @reugn, I've created a changelog YAML for you. |
| } | ||
|
|
||
| private static SearchHit[] stripNulls(SearchHit[] searchHits) { | ||
| return Arrays.stream(searchHits).filter(Objects::nonNull).toArray(SearchHit[]::new); |
There was a problem hiding this comment.
drive by comment: Is there a certain overhead of using streams here, especially in the case when there are no nulls to strip? Would it be worth testing something like a for loop that counts null elements, and if that count is > 0, a new compact array is created, and returned?
There was a problem hiding this comment.
Refactored to return the original array when there are no nulls. Streams are only used in the fallback path where reallocation is necessary.
| IndexReader reader = writer.getReader(); | ||
| writer.close(); | ||
|
|
||
| ContextIndexSearcher searcher = new ContextIndexSearcher(reader, null, null, new QueryCachingPolicy() { |
There was a problem hiding this comment.
could use a TrivialQueryCachingPolicy.NEVER here?
| final int index = fetchResult.counterGetAndIncrement(); | ||
| assert index < fetchResult.hits().getHits().length | ||
| : "not enough hits fetched. index [" + index + "] length: " + fetchResult.hits().getHits().length; | ||
| if (index >= fetchResult.hits().getHits().length) { |
There was a problem hiding this comment.
is this codepath tested?
There was a problem hiding this comment.
Yes, there is the SearchPhaseControllerTests.testMergeWithPartialFetchResults.
There was a problem hiding this comment.
This is in SearchPhaseController.mergeSuggest() - when setting a breakpoint in that method I don't see it getting called when running testMergeWithPartialFetchResults - am I missing something? :-)
spinscale
left a comment
There was a problem hiding this comment.
LGTM, left two minor comments.
elastic#144385) When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by: FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits. SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
* upstream/main: (146 commits) Revert "[Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)" Fix ArrayIndexOutOfBoundsException in fetch phase with partial results (elastic#144385) ESQL: Correctly manage NULL data type for SUM (elastic#144942) [ESQL] Fixes GroupedTopNBenchmark not executing (elastic#144944) Fix reader context leak when query response serialization fails (elastic#144708) Validate individual offset values in BULK_OFFSETS bounds checks (elastic#144643) Merge main21 source set into main in simdvec (elastic#144921) [TEST] Unmute TsidExtractingIdFieldMapperTests (elastic#144848) [Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539) Fix `ThreadedActionListenerTests#testRejectionHandling` (elastic#144795) Add new DLM Frozen Tier Transition execution plugin and service (elastic#144595) Prometheus: execute query_range via parsed EsqlStatement plan (elastic#144416) Investigate `testBulkIndexingRequestSplitting` failure (elastic#144766) Add test utility for wrapping directories in FilterDirectory layer (elastic#143563) Fix ES|QL decay tests with negative scale (elastic#144657) Fix circuit breaker leak in percolator query construction (elastic#144827) Use XPerFieldDocValuesFormat in AbstractTSDBSyntheticIdCodec (elastic#144744) [DOCS] Document how reindex work in CPS (elastic#144016) Fix Int4 vector library tests failing on Java 21 (elastic#144830) [DiskBBQ] Fix index sorting on flush (elastic#144938) ...
elastic#144385) When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by: FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits. SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
#144385) (#145002) When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by: FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits. SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
#144385) (#145003) When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by: FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits. SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
#144385) (#145004) When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by: FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits. SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
elastic#144385) When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by: FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits. SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
elastic#144385) When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by: FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits. SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
elastic#144385) When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by: FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits. SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
When a search fetch phase times out on a shard with
allow_partial_search_results: true, the coordinating node can hit anArrayIndexOutOfBoundsExceptioninSearchPhaseController.getHits()orSearchPhaseController.mergeSuggest(). This is caused by:FetchPhaseDocsIterator.iterate()— on timeout, usedSystem.arraycopy(searchHits, 0, partial, 0, i)to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond indexi. Fixed by replacing thearraycopywith astripNullsmethod that returns a compact array of only the successfully fetched hits.SearchPhaseController.getHits()/mergeSuggest()— the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causingArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.Tests
FetchPhaseDocsIteratorTests.testTimeoutReturnsCompactPartialResults— verifies thatiterate()returns a compact array (no nulls, shorter than input) when a timeout occurs mid-fetch with unsorted doc ids.SearchPhaseControllerTests.testMergeWithPartialFetchResults— verifies thatSearchPhaseController.merge()handles a shard returning fewer fetch hits than expected without throwingArrayIndexOutOfBoundsException.Fixes #140495