Skip to content

Conversation

@darjisagar7
Copy link
Contributor

@darjisagar7 darjisagar7 commented Sep 15, 2025

Description

When the follower cluster fetches the data from the leader node it breaches the 2GB limit for a single call. This PR handles that issue in the following ways

  1. Introducing an Index level batch size, which can be maintain by the clients via the index settings.
  2. Retry immediately by reducing the batch size. This batch size is dynamically maintained by the node. If the node is destroy then this values will be discarded.

Related Issues

Resolves #1568

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ReleasableBytesStreamOutput

Signed-off-by: Sagar Darji <[email protected]>
val fromSeq = seqNoAlreadyRequested.getAndAdd(batchSize.toLong()) + 1
val toSeq = fromSeq + batchSize - 1
logDebug("Fetching the batch $fromSeq-$toSeq")
val currentBatchSize = batchSizeSettings.getEffectiveBatchSize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we gate this dynamic batch size feature behind a cluster/index setting? We can enable it by default based on testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think another setting is needed as the test coverage is good.


// Handle 2GB limit exception specifically
if (e is IllegalArgumentException &&
e.message?.equals("ReleasableBytesStreamOutput cannot hold more than 2GB of data") == true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just check for ReleasableBytesStreamOutput cannot hold more than? If the 2GB limit changes in future, this exception will start failing again

const val REPLICATION_EXECUTOR_NAME_FOLLOWER = "replication_follower"
val REPLICATED_INDEX_SETTING: Setting<String> = Setting.simpleString("index.plugins.replication.follower.leader_index",
Setting.Property.InternalIndex, Setting.Property.IndexScope)
// Node-level batch size setting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Cluster level

// For dynamic batch size adjustment (2GB fix)
@Volatile
private var dynamicBatchSize: Int? = null
private val minBatchSize = 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets define this as a constant and reuse the value here and also in the settings definition in ReplicationPlugin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added a constant in ReplicationPlugin class

*/
fun reduceBatchSize() {
batchSizeSettings.reduceBatchSize()
logDebug("Batch size reduced to ${batchSizeSettings.getEffectiveBatchSize()}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log as INFO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

*/
fun resetBatchSize() {
batchSizeSettings.resetBatchSize()
logDebug("Batch size reset to ${batchSizeSettings.getEffectiveBatchSize()}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log as INFO

val fromSeq = seqNoAlreadyRequested.getAndAdd(batchSize.toLong()) + 1
val toSeq = fromSeq + batchSize - 1
logDebug("Fetching the batch $fromSeq-$toSeq")
val currentBatchSize = batchSizeSettings.getEffectiveBatchSize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think another setting is needed as the test coverage is good.

@ankitkala ankitkala enabled auto-merge (squash) September 19, 2025 11:30
@ankitkala ankitkala merged commit 5bbe925 into opensearch-project:main Sep 19, 2025
12 of 13 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.7 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.7 2.7
# Navigate to the new working tree
cd .worktrees/backport-2.7
# Create a new branch
git switch --create backport/backport-1580-to-2.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5bbe925d91816b998b9a472d2218f9102367b843
# Push it to GitHub
git push --set-upstream origin backport/backport-1580-to-2.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.7

Then, create a pull request where the base branch is 2.7 and the compare/head branch is backport/backport-1580-to-2.7.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.17 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.17 2.17
# Navigate to the new working tree
cd .worktrees/backport-2.17
# Create a new branch
git switch --create backport/backport-1580-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5bbe925d91816b998b9a472d2218f9102367b843
# Push it to GitHub
git push --set-upstream origin backport/backport-1580-to-2.17
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.17

Then, create a pull request where the base branch is 2.17 and the compare/head branch is backport/backport-1580-to-2.17.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.19 2.19
# Navigate to the new working tree
cd .worktrees/backport-2.19
# Create a new branch
git switch --create backport/backport-1580-to-2.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5bbe925d91816b998b9a472d2218f9102367b843
# Push it to GitHub
git push --set-upstream origin backport/backport-1580-to-2.19
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.19

Then, create a pull request where the base branch is 2.19 and the compare/head branch is backport/backport-1580-to-2.19.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 3.1.0 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-3.1.0 3.1.0
# Navigate to the new working tree
cd .worktrees/backport-3.1.0
# Create a new branch
git switch --create backport/backport-1580-to-3.1.0
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5bbe925d91816b998b9a472d2218f9102367b843
# Push it to GitHub
git push --set-upstream origin backport/backport-1580-to-3.1.0
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-3.1.0

Then, create a pull request where the base branch is 3.1.0 and the compare/head branch is backport/backport-1580-to-3.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Replication of large documents breaches the size limit of ReleasableBytesStreamOutput

4 participants