Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Handle BucketDelayedDeliveryTracker recover failed #22735

Merged

Conversation

dao-jun
Copy link
Member

@dao-jun dao-jun commented May 17, 2024

Motivation

We initialize DelayedDeliveryTracker when dispatch messages by calling DelayedDeliveryTrackerFactory.newTracker in AbstractBaseDispatcher.

However, when we set delayedDeliveryTrackerFactoryClassName to org.apache.pulsar.broker.delayed.BucketDelayedDeliveryTrackerFactory, BucketDelayedDeliveryTracker has a chance to recover failed(see here ), it may caused by Bookkeeper exception, timeout exception or sth else, and we don't handle the case.

If the exception happens, it may lead to memory leaks(Entries, OpReadEntry are unable to release) and some other issues,
if BucketDelayedDeliveryTracker always unable to recover, the situation will worsen.

The PR introduces fallback mechanism, if initialize BucketDelayedDeliveryTracker failed, fallback to InMemoryDelayedDeliveryTracker to handle this case.

Modifications

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@dao-jun dao-jun added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/broker labels May 17, 2024
@dao-jun dao-jun added this to the 3.4.0 milestone May 17, 2024
@dao-jun dao-jun self-assigned this May 17, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label May 17, 2024
@coderzc
Copy link
Member

coderzc commented May 17, 2024

@dao-jun Can you add tests to cover this case?

@dao-jun
Copy link
Member Author

dao-jun commented May 17, 2024

@dao-jun Can you add tests to cover this case?

Yes, but before add test, I want to get more feedbacks, to make sure this change is reasonable

@coderzc coderzc requested a review from mattisonchao May 17, 2024 07:31
@dao-jun
Copy link
Member Author

dao-jun commented May 17, 2024

@coderzc could you please also help review #22707 when you are available?

@dao-jun dao-jun changed the title [fix][broker] Fix BucketDelayedDeliveryTracker unable to recover [fix][broker] Handle BucketDelayedDeliveryTracker recover failed May 17, 2024
@dao-jun dao-jun closed this May 17, 2024
@dao-jun dao-jun reopened this May 17, 2024
@codecov-commenter
Copy link

codecov-commenter commented May 19, 2024

Codecov Report

Attention: Patch coverage is 67.27273% with 18 lines in your changes are missing coverage. Please review.

Project coverage is 73.19%. Comparing base (bbc6224) to head (ed9e5a2).
Report is 287 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #22735      +/-   ##
============================================
- Coverage     73.57%   73.19%   -0.39%     
+ Complexity    32624    32591      -33     
============================================
  Files          1877     1891      +14     
  Lines        139502   141466    +1964     
  Branches      15299    15519     +220     
============================================
+ Hits         102638   103543     +905     
- Misses        28908    29924    +1016     
- Partials       7956     7999      +43     
Flag Coverage Δ
inttests 27.40% <21.81%> (+2.81%) ⬆️
systests 24.60% <1.81%> (+0.28%) ⬆️
unittests 72.20% <67.27%> (-0.65%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...r/delayed/BucketDelayedDeliveryTrackerFactory.java 93.47% <100.00%> (+2.04%) ⬆️
...delayed/InMemoryDelayedDeliveryTrackerFactory.java 95.23% <100.00%> (+3.57%) ⬆️
...bucket/RecoverDelayedDeliveryTrackerException.java 100.00% <100.00%> (ø)
...sistent/PersistentDispatcherMultipleConsumers.java 73.80% <100.00%> (-0.53%) ⬇️
...r/delayed/bucket/BucketDelayedDeliveryTracker.java 83.12% <33.33%> (-0.58%) ⬇️
...rg/apache/pulsar/broker/service/BrokerService.java 81.81% <50.00%> (+1.03%) ⬆️
.../pulsar/broker/delayed/DelayedDeliveryTracker.java 20.00% <20.00%> (ø)

... and 349 files with indirect coverage changes

@dao-jun dao-jun requested a review from coderzc May 20, 2024 01:18
@dao-jun dao-jun requested a review from crossoverJie July 24, 2024 02:03
Copy link
Member

@crossoverJie crossoverJie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@dao-jun dao-jun merged commit 1c53841 into apache:master Jul 24, 2024
50 checks passed
@dao-jun dao-jun deleted the fix/fix_delayed_delivery_potential_issue branch July 24, 2024 06:40
@dao-jun dao-jun mentioned this pull request Jul 24, 2024
15 tasks
lhotari pushed a commit that referenced this pull request Jul 29, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 29, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker cherry-picked/branch-3.0 cherry-picked/branch-3.3 doc-not-needed Your PR changes do not impact docs ready-to-test release/3.0.6 release/3.3.1 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants