Skip to content

fix(logsdb): batch bulk indexing to prevent OOM in challenge tests#139770

Merged
salvatore-campagna merged 3 commits intoelastic:mainfrom
salvatore-campagna:fix/4573-bulk-challenge-oom
Dec 19, 2025
Merged

fix(logsdb): batch bulk indexing to prevent OOM in challenge tests#139770
salvatore-campagna merged 3 commits intoelastic:mainfrom
salvatore-campagna:fix/4573-bulk-challenge-oom

Conversation

@salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Dec 18, 2025

Fixes flaky OOM test failure in BulkDynamicMappingChallengeRestIT.testEsqlTermsAggregation.

The test was failing intermittently with java.lang.OutOfMemoryError: Java heap space when building bulk request strings for large randomly-generated documents.

When indexing documents, the test built a single bulk request string containing all documents. With complex mappings and several documents, the bulk request string could grow quite large.

Changes include:

  1. Batch bulk indexing (BulkChallengeRestIT.java):

    • Documents are now indexed in batches of 20 instead of all at once
    • Extracted common batching logic into indexDocumentsInBatches() method
    • Reuses StringBuilder with setLength(0) instead of creating new instances
  2. Disable ML (AbstractChallengeRestTest.java):

    • Added .setting("xpack.ml.enabled", "false") since these tests don't require ML

Closes #138717

Large randomly-generated documents can cause OOM when building a single
bulk request string for all documents. This change batches documents
(20 per batch) to reduce peak memory usage.

Also disables ML in the test cluster since these tests don't require it.
@salvatore-campagna salvatore-campagna self-assigned this Dec 18, 2025
@salvatore-campagna salvatore-campagna added >test-failure Triaged test failures from CI Team:StorageEngine labels Dec 18, 2025
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label needs:risk Requires assignment of a risk label (low, medium, blocker) v9.4.0 and removed Team:StorageEngine labels Dec 18, 2025
@salvatore-campagna salvatore-campagna added Team:StorageEngine and removed needs:triage Requires assignment of a team area label needs:risk Requires assignment of a risk label (low, medium, blocker) labels Dec 18, 2025
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label needs:risk Requires assignment of a risk label (low, medium, blocker) and removed Team:StorageEngine labels Dec 18, 2025
@salvatore-campagna salvatore-campagna added Team:StorageEngine :StorageEngine/Logs You know, for Logs >test Issues or PRs that are addressing/adding tests and removed >test-failure Triaged test failures from CI needs:triage Requires assignment of a team area label needs:risk Requires assignment of a risk label (low, medium, blocker) labels Dec 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@salvatore-campagna salvatore-campagna requested a review from a team December 19, 2025 07:47
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.setting("xpack.security.autoconfiguration.enabled", "false")
.setting("xpack.license.self_generated.type", "trial")
.setting("cluster.logsdb.enabled", "true")
.setting("xpack.ml.enabled", "false")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will really affect test coverage. This isn't like disabling security.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...I guess it is still useful anyway as it is not really needed and allows to reproduce the test locally. When I tried to run the test locally it was failing with:

[2025-12-18T07:11:59,849][ERROR][o.e.b.Elasticsearch      ] [test-cluster-0] fatal exception while booting Elasticsearch org.elasticsearch.ElasticsearchException: Failure running machine learning native code. This could be due to running on an unsupported OS or distribution, missing OS libraries, or a problem with the temp directory. To bypass this problem by running Elasticsearch without machine learning functionality set [xpack.ml.enabled: false].

@salvatore-campagna salvatore-campagna merged commit e3fb948 into elastic:main Dec 19, 2025
35 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Dec 19, 2025
* upstream/main: (25 commits)
  Add spec for project routing CRUD REST API endpoints (elastic#139634)
  Implement AllSupportedFIeldsTestCase for TDigest (elastic#139744)
  Mute elastic#139802 (elastic#139803)
  fix(logsdb): batch bulk indexing to prevent OOM in challenge tests (elastic#139770)
  Documentation for semantic_text auto pre-filtering (elastic#139749)
  Always do bulk scoring for rescoring when possible (elastic#139777)
  Optimize script sorts that do not require query scores (elastic#139748)
  Bump versions after 9.1.9 release
  Update branches.json for 9.1.9 release
  Bump versions after 9.2.3 release
  Prune changelogs after 8.19.9 release
  Bump versions after 8.19.9 release
  Update branches.json for 8.19.9 release
  Finalize docs for v9.2.3 release (elastic#139795)
  ESQL: Added timezone support to date_format and date_parse (elastic#138517)
  Update branches.json for 9.2.3 release
  Finalize docs for v9.1.9 release (elastic#139796)
  Switch inline stats to GA in docs (elastic#139753)
  Validate license in CPS (elastic#139105)
  FIPS 140-3 support with BC FIPS 2.0.x (elastic#139319)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:StorageEngine/Logs You know, for Logs Team:StorageEngine >test Issues or PRs that are addressing/adding tests v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] BulkDynamicMappingChallengeRestIT testEsqlTermsAggregation failing

3 participants