Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Jun 26, 2020

When the documents are large, a follower can receive a partial response because the requesting range of operations is capped by max_read_request_size instead of max_read_request_operation_count. In this case, the follower will continue reading the subsequent ranges without checking the remaining size of the buffer. The buffer then can use more memory than max_write_buffer_size and even causes OOM.

There's another issue in which we do not account for the outstanding read requests (see TODO). I will fix it in a follow-up.

@dnhatn dnhatn added >bug :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v8.0.0 v7.8.1 v7.9.0 v6.8.11 labels Jun 26, 2020
@dnhatn dnhatn requested a review from jasontedor June 26, 2020 20:27
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CCR)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 26, 2020
@dnhatn dnhatn requested a review from ywelsch June 30, 2020 12:49
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@dnhatn
Copy link
Member Author

dnhatn commented Jul 1, 2020

Thanks Jason.

@dnhatn dnhatn merged commit fadb89c into elastic:master Jul 1, 2020
@dnhatn dnhatn deleted the ccr-fix-buffer-limit branch July 1, 2020 13:00
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jul 1, 2020
When the documents are large, a follower can receive a partial response 
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jul 1, 2020
When the documents are large, a follower can receive a partial response 
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.
dnhatn added a commit that referenced this pull request Jul 1, 2020
When the documents are large, a follower can receive a partial response
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.

Backport of #58620
dnhatn added a commit that referenced this pull request Jul 1, 2020
When the documents are large, a follower can receive a partial response
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.

Backport of #58620
dnhatn added a commit that referenced this pull request Jul 1, 2020
When the documents are large, a follower can receive a partial response 
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.

Backport of #58620
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features release highlight Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v6.8.11 v7.8.1 v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants