Skip to content

Conversation

@nkurihar
Copy link
Contributor

@nkurihar nkurihar commented Oct 2, 2024

Fixes #446

Motivation

We found an ack failure issue: #446

MessageListeners can start to process messages before subscribing all child topics are completed in "multi topics" (e.g. partitioned, list, regex) cases.
If acknowledge() is called in messageListeners (it is very typical usage) before subscribing completion, it will fail with AlreadyClosed error since the state of the parent consumer is not Ready yet:

void MultiTopicsConsumerImpl::acknowledgeAsync(const MessageId& msgId, ResultCallback callback) {
if (state_ != Ready) {
interceptors_->onAcknowledge(Consumer(shared_from_this()), ResultAlreadyClosed, msgId);
callback(ResultAlreadyClosed);
return;
}

That results in ack holes, and finally full backlog.

Modifications

  • MultiTopicsConsumerImpl: Pause messageListeners at first by setting startPaused to true and resume them after subscribing all child topics are completed.
  • ConsumerImpl: Delete unused code.
    • It tries to skip sendFlowPermitsToBroker() at the first connection.
      • The motivation seems same as this PR, i.e. to prevent messageListeners from starting before subscribing completion.
    • However, actually it does not work so far, the variable firstTime looks always false at
      if (consumerTopicType_ == NonPartitioned || !firstTime) {
      because it becomes false at
      if (firstTime) {
      firstTime = false;
      }
      • Also it is static variable in a method that is shared by all ConsumerImpl instances, whereas it has no chance of returning to true once it becomes false.
    • It seems better that child consumers don't have to care about their parents and are completely independent of them.

Verifying this change

  • Make sure that the change passes the CI checks.

Specific tests for this issue is difficult to implement because it does not occur every time.

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@nkurihar nkurihar added the bug Something isn't working label Oct 3, 2024
Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for it?

@BewareMyPower BewareMyPower added this to the 3.7.0 milestone Oct 11, 2024
@BewareMyPower
Copy link
Contributor

Specific tests for this issue is difficult to implement because it does not occur every time.

Okay, I see it.

@BewareMyPower BewareMyPower merged commit 54e529a into apache:main Oct 11, 2024
BewareMyPower pushed a commit that referenced this pull request Apr 22, 2025
…481)

This was introduced in v3.7.0 via #447, which contained a change that attempted to fix an issue where a pre-mature ack of a message before a multi-topic subscriber was ready could have caused a crash.  To fix the original bug, all multi-topic subscriptions are started with their message listener paused. They later get un-paused once all topics are successfully subscribed and connected.  However, on a regex subscription when new topics are discovered, they also start in a paused state, and there's no mechanism to resume them.  Hence, they get stuck, and no messages for new topics will be processed.

This change adds a call to resume any new listeners after new topics are added.
BewareMyPower pushed a commit that referenced this pull request Apr 28, 2025
…481)

This was introduced in v3.7.0 via #447, which contained a change that attempted to fix an issue where a pre-mature ack of a message before a multi-topic subscriber was ready could have caused a crash.  To fix the original bug, all multi-topic subscriptions are started with their message listener paused. They later get un-paused once all topics are successfully subscribed and connected.  However, on a regex subscription when new topics are discovered, they also start in a paused state, and there's no mechanism to resume them.  Hence, they get stuck, and no messages for new topics will be processed.

This change adds a call to resume any new listeners after new topics are added.

(cherry picked from commit 0a9b7d9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Ack failure on message listener in multi topics consumer

2 participants