Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Slave thrown NPE and consumer receive duplicate messages when enable slaveActingMaster and remoteEscape #7601

Closed
3 tasks done
gaoyf opened this issue Dec 1, 2023 · 0 comments · Fixed by #7603
Closed
3 tasks done

Comments

@gaoyf
Copy link
Contributor

gaoyf commented Dec 1, 2023

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

OS: CentOS 6.9

RocketMQ version

branch: (develop|tag 5.1.4) version: 5.1.4

JDK Version

JDK: 1.8.0_202

Describe the Bug

When I enable slaveActingMaster and remoteEscape,send some messages to master,then close it,slave throw exceptions as follows:

2023-11-28 17:56:25 INFO TimerDequeuePutMessageService - Unknown error
java.lang.NullPointerException: null
        at org.apache.rocketmq.store.timer.TimerMessageStore.doPut(TimerMessageStore.java:1088)
        at org.apache.rocketmq.store.timer.TimerMessageStore.access$1900(TimerMessageStore.java:71)
        at org.apache.rocketmq.store.timer.TimerMessageStore$TimerDequeuePutMessageService.run(TimerMessageStore.java:1487)
        at java.lang.Thread.run(Thread.java:748)
2023-11-28 17:56:25 INFO TimerDequeuePutMessageService - Unknown error
java.lang.NumberFormatException: null
        at java.lang.Integer.parseInt(Integer.java:542)
        at java.lang.Integer.parseInt(Integer.java:615)
        at org.apache.rocketmq.store.timer.TimerMessageStore.convertMessage(TimerMessageStore.java:1142)
        at org.apache.rocketmq.store.timer.TimerMessageStore.convert(TimerMessageStore.java:1059)
        at org.apache.rocketmq.store.timer.TimerMessageStore.access$1800(TimerMessageStore.java:71)
        at org.apache.rocketmq.store.timer.TimerMessageStore$TimerDequeuePutMessageService.run(TimerMessageStore.java:1486)
        at java.lang.Thread.run(Thread.java:748)

I fixed the exception,and repeat previous steps,after I restarted master, the consumer received repeated messages

Steps to Reproduce

The first problem reproduce steps:

  1. Enable such configs as follows:
    1. NameServer:
      supportActingMaster=true
    2. Broker:
      enableSlaveActingMaster=true
      enableRemoteEscape=true
      totalReplicas=2
  2. Deploy one NameServer and tow group of brokers, such as:
    1. master: broker-a,slave: broker-a-s
    2. master: broker-b,slave: broker-b-s
  3. Run consumer and send some timer messages,such as:
    message.setDeliverTimeMs(System.currentTimeMillis() + 5 * 60_000L);
  4. Close the master, slave will throw NPE and NumberFormatException when timer message escape。

The second problem reproduce steps:

  1. After I fix the exception and repeat previous steps。
  2. Restart master, the consumer will receive repeated messages
    For example, the producer sent 10 messages and received 10 messages before the master restarted. After the master restarted, it repeatedly received the previously received messages.

What Did You Expect to See?

  1. Slave not throw exception。
  2. Consumer not revieve repeated messages.

What Did You See Instead?

  1. Slave throw exception。
  2. Consumer revieve repeated messages.

Additional Context

No response

RongtongJin pushed a commit that referenced this issue Dec 7, 2023
* fix NullPointerException when message escape to remote

* fix NumberFormatException when message retry to escape to remote

* fix timerCheckPoint of the master is not updated, causing the timer message to be replayed after master is restarted

* Use properties copies instead of referencing the same map when converting message
gaoyf added a commit to sohutv/rocketmq that referenced this issue Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant