Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize RedeliveryCount Handling Mechanism #23944

Open
1 of 2 tasks
NiuBlibing opened this issue Feb 7, 2025 · 7 comments
Open
1 of 2 tasks

Optimize RedeliveryCount Handling Mechanism #23944

NiuBlibing opened this issue Feb 7, 2025 · 7 comments
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Comments

@NiuBlibing
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Description

Currently, the message redelivery count (RedeliveryCount) only increases when the client actively calls the redeliver method #18239. However, in certain scenarios—such as when the program runs out of memory or crashes due to an unhandled exception—the message is not properly redelivered, and the count does not increase.

This can lead to messages being retried indefinitely without entering the dead-letter queue (DLQ), affecting system stability and failure recovery.

Optimization Proposal

It is recommended to move the logic for increasing RedeliveryCount from the client to the Broker side to ensure:

  • When a message is consumed but not acknowledged (due to process crashes or other issues), the Broker correctly increments RedeliveryCount.
  • Messages reach the dead-letter queue (DLQ) in a timely manner after exceeding the maximum retry limit, preventing infinite retries.

Solution

  • Broker detects consumer disconnection and proactively increments RedeliveryCount.
  • Client increases RedeliveryCount when receiving a message.

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@NiuBlibing NiuBlibing added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Feb 7, 2025
@dao-jun
Copy link
Member

dao-jun commented Feb 7, 2025

In broker side, the redeliverycount also stores in the memory, which means once the broker restart or the topic unload, all the redeliverycount in memory will be clear and after the topic recover, it will all reset to 0. see: InMemoryRedeliveryTracker.

If we don't persist redeliverycount, this issue would be meaningless, but, if we persist it, will it bring any benefits? It's hard to say.

@thetumbled
Copy link
Member

RedeliveryCount is maintained by broker, not client.

@Shawyeok
Copy link
Contributor

Shawyeok commented Mar 10, 2025

Currently, the message redelivery count (RedeliveryCount) only increases when the client actively calls the redeliver method #18239.

@NiuBlibing

It’s partially correct. You can enable ackTimeout on the consumer. Once the consumer returns a message from the receive method, the countdown starts. A background task runs periodically to redeliver messages that exceed the ackTimeout.
For an instance: the application thread which processing message may get stuck on deadlock or other issues, thus it may have no chance to call negativeAck to redeliver this message to other consumer.

@NiuBlibing
Copy link
Author

Currently, the message redelivery count (RedeliveryCount) only increases when the client actively calls the redeliver method #18239.

@NiuBlibing

It’s partially correct. You can enable ackTimeout on the consumer. Once the consumer returns a message from the receive method, the countdown starts. A background task runs periodically to redeliver messages that exceed the ackTimeout. For an instance: the application thread which processing message may get stuck on deadlock or other issues, thus it may have no chance to call negativeAck to redeliver this message to other consumer.

Yeah, depending on client is not full reliable for these cases.

@Shawyeok
Copy link
Contributor

Yes, I agree. However, I believe it’s technically sufficient. If your application frequently encounters OOM errors or crashes, you should monitor and address those issues within your application, right?

Moreover, one reason the ackTimeout countdown happens on the client side is due to the prefetch mechanism. The broker doesn’t know exactly when the application starts processing a message because messages are first placed in the consumer’s receiverQueue for performance reasons.

@Shawyeok
Copy link
Contributor

When a message is consumed but not acknowledged (due to process crashes or other issues), the Broker correctly increments RedeliveryCount.

I don’t think this is achievable with the current version of the messaging protocol.

A key question to consider is: How does the broker determine that messages have been delivered to the application? When messages are dispatched from the broker to the consumer, they might just be placed in the consumer’s receiverQueue, but the application may not have actually received them yet.

@NiuBlibing
Copy link
Author

If your application frequently encounters OOM errors or crashes, you should monitor and address those issues within your application, right?

Agreed, detecting bugs and fixing them is the preferred option in most cases. However, I've come across a scenario where certain payloads cause third-party libraries to generate oom (e.g. processing complex files that can't be directly determined by file size), and it's not easy to go deeper into the third-party libraries and rewrite the relevant logic. I'd like to be able to handle this with the crash retry mechanism in the message queue so that these errors are handled transparently and don't introduce extra work.

A key question to consider is: How does the broker determine that messages have been delivered to the application? When messages are dispatched from the broker to the consumer, they might just be placed in the consumer’s receiverQueue, but the application may not have actually received them yet.

Maybe it's also not easy for pulsar to deal the logic in broker without extra performance overload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

No branches or pull requests

4 participants