Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE #7601] Fix slave acting master bug #7603

Merged

Conversation

gaoyf
Copy link
Contributor

@gaoyf gaoyf commented Dec 1, 2023

Which Issue(s) This PR Fixes

Fixes #7601

Brief Description

Fix #7601's exception and master use slave's TimerCheckPoint。

How Did You Test This Change?

Use #7601's reproduce step to test.

@codecov-commenter
Copy link

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (65faea2) 43.19% compared to head (3509621) 43.16%.

Files Patch % Lines
...apache/rocketmq/store/timer/TimerMessageStore.java 0.00% 4 Missing ⚠️
...apache/rocketmq/broker/slave/SlaveSynchronize.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop    #7603      +/-   ##
=============================================
- Coverage      43.19%   43.16%   -0.03%     
- Complexity      9782     9783       +1     
=============================================
  Files           1162     1162              
  Lines          84378    84381       +3     
  Branches       10955    10957       +2     
=============================================
- Hits           36451    36427      -24     
- Misses         43397    43434      +37     
+ Partials        4530     4520      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -1140,8 +1146,8 @@ public MessageExtBrokerInner convertMessage(MessageExt msgExt, boolean needRoll)
} else {
msgInner.setTopic(msgInner.getProperty(MessageConst.PROPERTY_REAL_TOPIC));
msgInner.setQueueId(Integer.parseInt(msgInner.getProperty(MessageConst.PROPERTY_REAL_QUEUE_ID)));
MessageAccessor.clearProperty(msgInner, MessageConst.PROPERTY_REAL_TOPIC);
MessageAccessor.clearProperty(msgInner, MessageConst.PROPERTY_REAL_QUEUE_ID);
// MessageAccessor.clearProperty(msgInner, MessageConst.PROPERTY_REAL_TOPIC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain why comment these lines and how the second bug occurs?

Copy link
Contributor Author

@gaoyf gaoyf Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted TimerDequeuePutMessageService‘s some key codes, as follows:

while (!isStopped() && !doRes) {
    try {
        MessageExtBrokerInner msg = convert(msgExt, tr.getEnqueueTime(), needRoll(tr.getMagic()));
        doRes = PUT_NEED_RETRY != doPut(msg, needRoll(tr.getMagic()));
    } catch (Throwable t) {
        LOGGER.info("Unknown error", t);
    }
}

If method doPut throw exception, the logic try to execute method convert in a loop。
However, MessageExtBrokerInner and MessageExt hold the same Map of properties, convertMessage FYI, so the following code will return null:

msgInner.getProperty(MessageConst.PROPERTY_REAL_TOPIC)
msgInner.getProperty(MessageConst.PROPERTY_REAL_QUEUE_ID)

Because property MessageConst.PROPERTY_REAL_QUEUE_ID has been removed in the first execution.
So canot remove these properties for the retry logic.
On the other hand,the original logic: MessageExtBrokerInner will be reputed to commitlog,it will contains property MessageConst.PROPERTY_REAL_TOPIC and MessageConst.PROPERTY_REAL_TOPIC,because the fllowing code:

msgInner.setPropertiesString(MessageDecoder.messageProperties2String(msgExt.getProperties()));

That is to say, when retrieving this message, it must contains these two properties,So I don't think commenting out these codes will affect the original logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we should add a messageAccessior.deepCopyProperties instead of using the same reference.

Copy link
Contributor Author

@gaoyf gaoyf Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we should add a messageAccessior.deepCopyProperties instead of using the same reference.

Your idea looks better, although I don't think commenting out this code will have any impact,but I modified the code as you suggested.

@RongtongJin RongtongJin changed the title fix slave acting master bug [ISSUE #7601] Fix slave acting master bug Dec 6, 2023
@RongtongJin RongtongJin merged commit faae647 into apache:develop Dec 7, 2023
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Slave thrown NPE and consumer receive duplicate messages when enable slaveActingMaster and remoteEscape
4 participants