Skip to content

Conversation

@nibix
Copy link
Contributor

@nibix nibix commented Dec 1, 2025

Description

In the class BulkRequest, the indices property provides a short-cut to retrieve the names of all indices referenced in the bulk items.

The index names are added to the Set when one of the add() methods is being called. However, the property is neither being serialized, nor the constructor for deserialization makes sure that the property is being properly initialized again. Thus, the property will be an empty set after the request was serialized and deserialized.

Fortunately, this happens only in rare cases, specifically when ingestion pipelines are present and the request did not hit a node with the role "ingest". Compare this code:

// Determine if any requests require ingest pipeline execution.
boolean hasIndexRequestsWithPipelines = resolvePipelinesForActionRequests(bulkRequest.requests, metadata, minNodeVersion);
if (hasIndexRequestsWithPipelines) {
// If ingest pipeline execution is required, we will execute the pipelines first before other operations.
// After pipeline execution, this method (doExecute) will be called again, but with the requests updated from ingest processing.
// The execution will also update the pipelines to IngestService.NOOP_PIPELINE_NAME on each request.
// This ensures that this on the second time through this method, we will not execute pipelines again.
try {
if (Assertions.ENABLED) {
final boolean arePipelinesResolved = extractIndexRequests(bulkRequest.requests()).stream()
.allMatch(IndexRequest::isPipelineResolved);
assert arePipelinesResolved : bulkRequest;
}
if (clusterService.localNode().isIngestNode()) {
processBulkIndexIngestRequest(task, bulkRequest, executorName, listener);
} else {
ingestForwarder.forwardIngestRequest(BulkAction.INSTANCE, bulkRequest, listener);
}
} catch (Exception e) {
listener.onFailure(e);
}
return;
}

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed bulk request deserialization so indices are correctly populated and tracked after a serialization round-trip.
  • Tests

    • Added a serialization/deserialization test to verify that indices are preserved after serializing and deserializing bulk requests.
  • Documentation

    • Added a changelog entry noting the deserialization fix for bulk requests.

✏️ Tip: You can customize this high-level summary in your review settings.

@nibix nibix requested a review from a team as a code owner December 1, 2025 05:10
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Walkthrough

Modified BulkRequest deserialization to populate the indices set from each deserialized DocWriteRequest; added a unit test for serialization/deserialization roundtrip and a changelog entry documenting the fix.

Changes

Cohort / File(s) Change Summary
BulkRequest deserialization
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java
In BulkRequest(StreamInput in) deserializer, after handling version-specific fields, populate the indices set by mapping each stored DocWriteRequest to its index and adding it to indices.
Tests — serialization roundtrip
server/src/test/java/org/opensearch/action/bulk/TransportBulkActionTests.java
Added testSerializationDeserialization() which serializes a BulkRequest to a Bytes stream with a specified version, deserializes it back, and asserts the indices set contains the expected index; added necessary test utilities/imports.
Changelog
CHANGELOG.md
Added Unreleased 3.x Fixed entry: "Fixed handling of property index in BulkRequest during deserialization (PR #20132)".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Review focus:
    • BulkRequest(StreamInput in) — verify indices aggregation logic and version compatibility.
    • TransportBulkActionTests.testSerializationDeserialization() — ensure test resources and stream/version usage are correct and do not leak.

Poem

🐇 I hop through bytes where indices hide,
I sniff each DocWriteRequest inside.
One-by-one I gather the names,
No longer lost in serialization games. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main fix: ensuring the indices Set is properly initialized during BulkRequest deserialization from a TransportRequest stream.
Description check ✅ Passed The description provides a detailed explanation of the issue, includes the reason it occurs rarely, references relevant code, and confirms testing was added per the checklist.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8ad58b and d946f19.

📒 Files selected for processing (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportBulkActionTests.java (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportBulkActionTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)

Comment @coderabbitai help to get the list of available commands and usage tips.

…being deserialized from TransportRequest stream

Signed-off-by: Nils Bandener <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java (1)

106-116: Deserialization now correctly reconstructs indices set

Populating indices from requests after deserialization aligns with how all add/internalAdd paths already maintain this set and directly addresses the bug where indices was empty after transport. This is functionally correct and side‑effect free (constructor starts with an empty indices set).

If you want to slightly simplify and reduce coupling to DocRequest, you could consider an equivalent loop, but this is purely optional:

-        requests.stream().map(DocRequest::index).forEach(indices::add);
+        for (DocWriteRequest<?> request : requests) {
+            indices.add(request.index());
+        }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ee30dc and 2738fa2.

📒 Files selected for processing (1)
  • server/src/main/java/org/opensearch/action/bulk/BulkRequest.java (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: Analyze (java)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (1)
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java (1)

35-43: Import of DocRequest is appropriate and scoped

The new DocRequest import is only used for the method reference in the deserialization constructor and keeps the file’s dependency surface aligned with existing request types. No issues here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java (1)

115-115: LGTM! Correctly populates the indices set after deserialization.

The logic mirrors the behavior of the add() methods and ensures that the indices set is properly initialized when deserializing a BulkRequest.

Optional: Consider a traditional for-loop for improved readability:

-        requests.stream().map(DocRequest::index).forEach(indices::add);
+        for (DocWriteRequest<?> request : requests) {
+            indices.add(request.index());
+        }
server/src/test/java/org/opensearch/action/bulk/TransportBulkActionTests.java (1)

387-397: LGTM! Test validates the deserialization fix.

The test correctly verifies that the indices set is populated after serialization and deserialization of a BulkRequest.

Optional: Consider enhancing test coverage by including multiple requests with different indices:

 public void testSerializationDeserialization() throws Exception {
-    BulkRequest bulkRequest = new BulkRequest().add(new IndexRequest("index").id("id").source(Collections.emptyMap()));
+    BulkRequest bulkRequest = new BulkRequest()
+        .add(new IndexRequest("index1").id("id1").source(Collections.emptyMap()))
+        .add(new IndexRequest("index2").id("id2").source(Collections.emptyMap()))
+        .add(new DeleteRequest("index1", "id3"));
     MockBigArrays mockBigArrays = new MockBigArrays(new MockPageCacheRecycler(Settings.EMPTY), new NoneCircuitBreakerService());
 
     try (ReleasableBytesStreamOutput out = new ReleasableBytesStreamOutput(mockBigArrays)) {
         bulkRequest.writeTo(out);
         BulkRequest deserializedRequest = new BulkRequest(out.bytes().streamInput());
-        assertEquals(Set.of("index"), deserializedRequest.getIndices());
+        assertEquals(Set.of("index1", "index2"), deserializedRequest.getIndices());
 
     }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2738fa2 and 14a3573.

📒 Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/bulk/BulkRequest.java (2 hunks)
  • server/src/test/java/org/opensearch/action/bulk/TransportBulkActionTests.java (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change

Signed-off-by: Nils Bandener <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

✅ Gradle check result for f8ad58b: SUCCESS

@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.22%. Comparing base (97d3864) to head (d946f19).
⚠️ Report is 16 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20132      +/-   ##
============================================
- Coverage     73.33%   73.22%   -0.12%     
+ Complexity    71679    71630      -49     
============================================
  Files          5790     5786       -4     
  Lines        327549   327755     +206     
  Branches      47181    47206      +25     
============================================
- Hits         240217   240003     -214     
- Misses        68080    68470     +390     
- Partials      19252    19282      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

✅ Gradle check result for d946f19: SUCCESS

@cwperks cwperks merged commit 859f7c3 into opensearch-project:main Dec 1, 2025
35 checks passed
rgsriram pushed a commit to rgsriram/OpenSearch that referenced this pull request Dec 5, 2025
…being deserialized from TransportRequest stream (opensearch-project#20132)

* BulkRequest: Make sure that indices Set is properly initialized when being deserialized from TransportRequest stream

Signed-off-by: Nils Bandener <[email protected]>

* fix

Signed-off-by: Nils Bandener <[email protected]>

* simplified BytesStreamOutput construction

Signed-off-by: Nils Bandener <[email protected]>

---------

Signed-off-by: Nils Bandener <[email protected]>
liuguoqingfz pushed a commit to liuguoqingfz/OpenSearch that referenced this pull request Dec 15, 2025
…being deserialized from TransportRequest stream (opensearch-project#20132)

* BulkRequest: Make sure that indices Set is properly initialized when being deserialized from TransportRequest stream

Signed-off-by: Nils Bandener <[email protected]>

* fix

Signed-off-by: Nils Bandener <[email protected]>

* simplified BytesStreamOutput construction

Signed-off-by: Nils Bandener <[email protected]>

---------

Signed-off-by: Nils Bandener <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants