-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplication fails when the batch message contains duplicate message and valid message #6273
Comments
@codelipenghui Do you think there could be a relationship between this issue and #6224 |
I'm adding my discussion Slack discussion with @codelipenghui (with his permission) to create a better record of our discussion on this issue. He documented what happens in this issue: "For example, we have four batch messages in the producer pending message queue. The messages look like this(I show the sequence ID for more straightforward understanding):
Then the message-1 published to the broker:
When the producer flushes the message-2 to the broker, the message-2 has a sequence ID that lower than the last pushed sequence ID of the producer. So the producer should stop flush message-3 because the producer should get the response of message-2 and re-batch the messages of message-2. After re-batching, the producer should retry the re-batched message. The current state of the broker and producer are:
After message-2 published, the producer starts flushing message-3. So that retry the re-batched message can be handled properly. Regarding:
I asked: He replied: I said: He replied: I replied: He added: I asked: He replied: To better understand the current implementation, he suggested that I investigate these tests: |
@devinbost Looks not relevant |
…ial duplicated messages and non-duplicated messages into a batch. (#6326) Fixes #6273 Motivation The main reason for #6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch.
…ial duplicated messages and non-duplicated messages into a batch. (apache#6326) Fixes apache#6273 Motivation The main reason for apache#6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch.
…ial duplicated messages and non-duplicated messages into a batch. (apache#6326) Fixes apache#6273 Motivation The main reason for apache#6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch. (cherry picked from commit b898f49)
…ial duplicated messages and non-duplicated messages into a batch. (#6326) Fixes #6273 Motivation The main reason for #6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch. (cherry picked from commit b898f49)
…ial duplicated messages and non-duplicated messages into a batch. (#6326) Fixes #6273 Motivation The main reason for #6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch. (cherry picked from commit b898f49)
…ial duplicated messages and non-duplicated messages into a batch. (apache#6326) Fixes apache#6273 Motivation The main reason for apache#6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch. (cherry picked from commit b898f49)
…ial duplicated messages and non-duplicated messages into a batch. (apache#6326) Fixes apache#6273 Motivation The main reason for apache#6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch.
Describe the bug
This bug was discussed in the slack channel https://apache-pulsar.slack.com/archives/C5ZSVEN4E/p1581119933163000. The problem occurs when a batch messages contain both duplicate data and valid data. If sequence ID for a message is lower than the broker maintained the highest sequence ID, the batch message which contains this message also considered a duplicate. So that will lead to valid messages can't be stored success and consumers can't get these messages.
The text was updated successfully, but these errors were encountered: