-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-24183 [flakey test] replication.TestAddToSerialReplicationPeer #1514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
🎊 +1 overall
This message was automatically generated. |
|
There are two flakies. testAddToSerialPeer failure is that it just needs to make sure the source RS's inmemory map contains only the new wal file. (not the RS which region moves to) After that, there is still one failure which is common to testAddToSerialPeer and testChangeToSerial. If the old wal file before rollover is still in the inmemory map of replicateSourceManager, during peer disable/enable/config update, it could be still be replicated over from the begin to peer cluster. If that happens, the old wal entries and the new wal entries will be written to the same wal file (which results in out of order seq numbers). waitUntilReplicatedToTheCurrentWALFile() does not really guarantee that the inmemory map is forwarded to the new wal file, there is a small window that there is only one wal file in map which is the old wal file. Added a new check to make sure that inmemory map only has the new wal file. This could happen in the production cluster as well, however I do not think this is the purpose of these two test cases. @Apache9, please provide your input, thanks. |
Apache9
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
Test failures related @huaxiangsun ? Let me rerun the tests to see.... Otherwise, +1 on patch. Lets try it. I like the root cause analysis. Stick that up on the issue tooo. |
|
Thanks @Apache9 and @saintstack. The failed tests are not related cause the patch is in the specific test case. I checked the failed three cases, they are all due to malformat xml error so could related to the known native thread issue. |
|
Forwarded the comments to the jira. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
Test failures are unrelated, merging, thanks for the review. |
…pache#1514) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: stack <[email protected]>
…pache#1514) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: stack <[email protected]>
…1514) (#1525) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: stack <[email protected]>
…1514) (#1526) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: stack <[email protected]>
…pache#1514) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: stack <[email protected]>
…pache#1514) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: stack <[email protected]>
…pache#1514) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: stack <[email protected]>
…pache#1514) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: stack <[email protected]>
…pache#1514) (apache#1526) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: stack <[email protected]>
Will put up a root cause analysis later.