Optimize RedeliveryCount Handling Mechanism #23944

NiuBlibing · 2025-02-07T07:05:24Z

Search before asking

I searched in the issues and found nothing similar.

Motivation

Description

Currently, the message redelivery count (RedeliveryCount) only increases when the client actively calls the redeliver method #18239. However, in certain scenarios—such as when the program runs out of memory or crashes due to an unhandled exception—the message is not properly redelivered, and the count does not increase.

This can lead to messages being retried indefinitely without entering the dead-letter queue (DLQ), affecting system stability and failure recovery.

Optimization Proposal

It is recommended to move the logic for increasing RedeliveryCount from the client to the Broker side to ensure:

When a message is consumed but not acknowledged (due to process crashes or other issues), the Broker correctly increments RedeliveryCount.
Messages reach the dead-letter queue (DLQ) in a timely manner after exceeding the maximum retry limit, preventing infinite retries.

Solution

Broker detects consumer disconnection and proactively increments RedeliveryCount.
Client increases RedeliveryCount when receiving a message.

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

dao-jun · 2025-02-07T17:10:58Z

In broker side, the redeliverycount also stores in the memory, which means once the broker restart or the topic unload, all the redeliverycount in memory will be clear and after the topic recover, it will all reset to 0. see: InMemoryRedeliveryTracker.

If we don't persist redeliverycount, this issue would be meaningless, but, if we persist it, will it bring any benefits? It's hard to say.

thetumbled · 2025-02-18T04:08:50Z

RedeliveryCount is maintained by broker, not client.

Shawyeok · 2025-03-10T01:34:54Z

Currently, the message redelivery count (RedeliveryCount) only increases when the client actively calls the redeliver method #18239.

@NiuBlibing

It’s partially correct. You can enable ackTimeout on the consumer. Once the consumer returns a message from the receive method, the countdown starts. A background task runs periodically to redeliver messages that exceed the ackTimeout.
For an instance: the application thread which processing message may get stuck on deadlock or other issues, thus it may have no chance to call negativeAck to redeliver this message to other consumer.

NiuBlibing · 2025-03-10T01:54:44Z

Currently, the message redelivery count (RedeliveryCount) only increases when the client actively calls the redeliver method #18239.

@NiuBlibing

It’s partially correct. You can enable ackTimeout on the consumer. Once the consumer returns a message from the receive method, the countdown starts. A background task runs periodically to redeliver messages that exceed the ackTimeout. For an instance: the application thread which processing message may get stuck on deadlock or other issues, thus it may have no chance to call negativeAck to redeliver this message to other consumer.

Yeah, depending on client is not full reliable for these cases.

Shawyeok · 2025-03-10T02:48:02Z

Yes, I agree. However, I believe it’s technically sufficient. If your application frequently encounters OOM errors or crashes, you should monitor and address those issues within your application, right?

Moreover, one reason the ackTimeout countdown happens on the client side is due to the prefetch mechanism. The broker doesn’t know exactly when the application starts processing a message because messages are first placed in the consumer’s receiverQueue for performance reasons.

Shawyeok · 2025-03-10T02:57:15Z

When a message is consumed but not acknowledged (due to process crashes or other issues), the Broker correctly increments RedeliveryCount.

I don’t think this is achievable with the current version of the messaging protocol.

A key question to consider is: How does the broker determine that messages have been delivered to the application? When messages are dispatched from the broker to the consumer, they might just be placed in the consumer’s receiverQueue, but the application may not have actually received them yet.

NiuBlibing · 2025-03-10T03:51:03Z

If your application frequently encounters OOM errors or crashes, you should monitor and address those issues within your application, right?

Agreed, detecting bugs and fixing them is the preferred option in most cases. However, I've come across a scenario where certain payloads cause third-party libraries to generate oom (e.g. processing complex files that can't be directly determined by file size), and it's not easy to go deeper into the third-party libraries and rewrite the relevant logic. I'd like to be able to handle this with the crash retry mechanism in the message queue so that these errors are handled transparently and don't introduce extra work.

A key question to consider is: How does the broker determine that messages have been delivered to the application? When messages are dispatched from the broker to the consumer, they might just be placed in the consumer’s receiverQueue, but the application may not have actually received them yet.

Maybe it's also not easy for pulsar to deal the logic in broker without extra performance overload.

NiuBlibing added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Feb 7, 2025

thetumbled closed this as completed Feb 18, 2025

thetumbled reopened this Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize RedeliveryCount Handling Mechanism #23944

Optimize RedeliveryCount Handling Mechanism #23944

NiuBlibing commented Feb 7, 2025

dao-jun commented Feb 7, 2025

thetumbled commented Feb 18, 2025

Shawyeok commented Mar 10, 2025 •

edited

Loading

NiuBlibing commented Mar 10, 2025

Shawyeok commented Mar 10, 2025

Shawyeok commented Mar 10, 2025

NiuBlibing commented Mar 10, 2025

Optimize RedeliveryCount Handling Mechanism #23944

Optimize RedeliveryCount Handling Mechanism #23944

Comments

NiuBlibing commented Feb 7, 2025

Search before asking

Motivation

Description

Optimization Proposal

Solution

Alternatives

Anything else?

Are you willing to submit a PR?

dao-jun commented Feb 7, 2025

thetumbled commented Feb 18, 2025

Shawyeok commented Mar 10, 2025 • edited Loading

NiuBlibing commented Mar 10, 2025

Shawyeok commented Mar 10, 2025

Shawyeok commented Mar 10, 2025

NiuBlibing commented Mar 10, 2025

Shawyeok commented Mar 10, 2025 •

edited

Loading