Skip to content

Fix ArrayIndexOutOfBoundsException in fetch phase with partial results#144385

Merged
reugn merged 12 commits intoelastic:mainfrom
reugn:fix/array-index-out-of-bounds
Mar 26, 2026
Merged

Fix ArrayIndexOutOfBoundsException in fetch phase with partial results#144385
reugn merged 12 commits intoelastic:mainfrom
reugn:fix/array-index-out-of-bounds

Conversation

@reugn
Copy link
Copy Markdown
Member

@reugn reugn commented Mar 17, 2026

When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by:

  • FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits.

  • SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.

Tests

  • FetchPhaseDocsIteratorTests.testTimeoutReturnsCompactPartialResults — verifies that iterate() returns a compact array (no nulls, shorter than input) when a timeout occurs mid-fetch with unsorted doc ids.
  • SearchPhaseControllerTests.testMergeWithPartialFetchResults — verifies that SearchPhaseController.merge() handles a shard returning fewer fetch hits than expected without throwing ArrayIndexOutOfBoundsException.

Fixes #140495

@reugn reugn added >bug priority:normal A label for assessing bug priority to be used by ES engineers Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations labels Mar 17, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @reugn, I've created a changelog YAML for you.

@reugn reugn added the auto-backport Automatically create backport pull requests when merged label Mar 17, 2026
@reugn reugn requested a review from drempapis March 17, 2026 14:31
}

private static SearchHit[] stripNulls(SearchHit[] searchHits) {
return Arrays.stream(searchHits).filter(Objects::nonNull).toArray(SearchHit[]::new);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by comment: Is there a certain overhead of using streams here, especially in the case when there are no nulls to strip? Would it be worth testing something like a for loop that counts null elements, and if that count is > 0, a new compact array is created, and returned?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to return the original array when there are no nulls. Streams are only used in the fallback path where reallocation is necessary.

IndexReader reader = writer.getReader();
writer.close();

ContextIndexSearcher searcher = new ContextIndexSearcher(reader, null, null, new QueryCachingPolicy() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use a TrivialQueryCachingPolicy.NEVER here?

final int index = fetchResult.counterGetAndIncrement();
assert index < fetchResult.hits().getHits().length
: "not enough hits fetched. index [" + index + "] length: " + fetchResult.hits().getHits().length;
if (index >= fetchResult.hits().getHits().length) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this codepath tested?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is the SearchPhaseControllerTests.testMergeWithPartialFetchResults.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in SearchPhaseController.mergeSuggest() - when setting a breakpoint in that method I don't see it getting called when running testMergeWithPartialFetchResults - am I missing something? :-)

Copy link
Copy Markdown
Contributor

@spinscale spinscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left two minor comments.

@reugn reugn merged commit cc5c3ad into elastic:main Mar 26, 2026
36 checks passed
@reugn reugn deleted the fix/array-index-out-of-bounds branch March 26, 2026 12:08
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.19
9.2
9.3

reugn added a commit to reugn/elasticsearch that referenced this pull request Mar 26, 2026
elastic#144385)

When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by:

FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits.

SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 26, 2026
* upstream/main: (146 commits)
  Revert "[Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)"
  Fix ArrayIndexOutOfBoundsException in fetch phase with partial results (elastic#144385)
  ESQL: Correctly manage NULL data type for SUM (elastic#144942)
  [ESQL] Fixes GroupedTopNBenchmark not executing (elastic#144944)
  Fix reader context leak when query response serialization fails (elastic#144708)
  Validate individual offset values in BULK_OFFSETS bounds checks (elastic#144643)
  Merge main21 source set into main in simdvec (elastic#144921)
  [TEST] Unmute TsidExtractingIdFieldMapperTests (elastic#144848)
  [Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)
  Fix `ThreadedActionListenerTests#testRejectionHandling` (elastic#144795)
  Add new DLM Frozen Tier Transition execution plugin and service (elastic#144595)
  Prometheus: execute query_range via parsed EsqlStatement plan (elastic#144416)
  Investigate `testBulkIndexingRequestSplitting` failure (elastic#144766)
  Add test utility for wrapping directories in FilterDirectory layer (elastic#143563)
  Fix ES|QL decay tests with negative scale (elastic#144657)
  Fix circuit breaker leak in percolator query construction (elastic#144827)
  Use XPerFieldDocValuesFormat in AbstractTSDBSyntheticIdCodec (elastic#144744)
  [DOCS] Document how reindex work in CPS (elastic#144016)
  Fix Int4 vector library tests failing on Java 21 (elastic#144830)
  [DiskBBQ] Fix index sorting on flush (elastic#144938)
  ...
reugn added a commit to reugn/elasticsearch that referenced this pull request Mar 26, 2026
elastic#144385)

When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by:

FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits.

SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
elasticsearchmachine pushed a commit that referenced this pull request Mar 26, 2026
#144385) (#145002)

When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by:

FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits.

SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
elasticsearchmachine pushed a commit that referenced this pull request Mar 26, 2026
#144385) (#145003)

When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by:

FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits.

SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
elasticsearchmachine pushed a commit that referenced this pull request Mar 26, 2026
#144385) (#145004)

When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by:

FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits.

SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
seanzatzdev pushed a commit to seanzatzdev/elasticsearch that referenced this pull request Mar 26, 2026
elastic#144385)

When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by:

FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits.

SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
seanzatzdev pushed a commit to seanzatzdev/elasticsearch that referenced this pull request Mar 27, 2026
elastic#144385)

When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by:

FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits.

SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
mamazzol pushed a commit to mamazzol/elasticsearch that referenced this pull request Mar 30, 2026
elastic#144385)

When a search fetch phase times out on a shard with allow_partial_search_results: true, the coordinating node can hit an ArrayIndexOutOfBoundsException in SearchPhaseController.getHits() or SearchPhaseController.mergeSuggest(). This is caused by:

FetchPhaseDocsIterator.iterate() — on timeout, used System.arraycopy(searchHits, 0, partial, 0, i) to build the partial result. Because docs are iterated in doc-id order but placed at their original (score-sorted) positions, this copy produced a corrupted array. It included nulls at unfilled positions and dropped valid hits stored beyond index i. Fixed by replacing the arraycopy with a stripNulls method that returns a compact array of only the successfully fetched hits.

SearchPhaseController.getHits() / mergeSuggest() — the query phase promises N docs from a shard, but the fetch phase may return fewer after a timeout. The merge loop used a counter to index into the fetch result array without checking bounds, causing ArrayIndexOutOfBoundsException. Fixed by adding a bounds check that skips entries when the counter exceeds the available hits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.19.0 v9.2.0 v9.3.0 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_index_out_of_bounds_exception when elastic search request is huge in size

4 participants