-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-24807 Backport HBASE-20417 to branch-1 #2197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| try (WALEntryStream entryStream = | ||
| new WALEntryStream(logQueue, fs, conf, lastReadPosition, metrics)) { | ||
| while (isReaderRunning()) { // loop here to keep reusing stream while we can | ||
| if (!source.isPeerEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a safeguard to prevent accumulation of batches right? No other implications of the patch that I can think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and the accumulation of batches can lead to major problems, because it's been accounted on overall buffer usage by ReplicationSourceManager. If buffer usage reaches the quota limits, replication becomes stuck. And since we check the buffer usage at ReplicationSourceManager, that means a single buffer for all peers. If one peer is disabled, while other source peers were supposed to continue to get replicated edits, those source would also be stuck because of this, until an RS restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I've been following the jira updates on the issue that Josh created.
virajjasani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Let me re-trigger the build, it should work now. |
|
💔 -1 overall
This message was automatically generated. |
|
Checking on the UT failures. |
|
Test failures look unrelated, have those tests passing, locally. |
No description provided.