Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve] [broker] Fix broker restart logic #20113

Merged

Conversation

wolfstudy
Copy link
Member

@wolfstudy wolfstudy commented Apr 17, 2023

Motivation

Currently, when we execute the logic of Broker Close, we first shut down the Broker Service, and then close the LoadManagerTask. This will cause the Broker to be assigned to a new bundle during the shutdown of the Broker, resulting in an inelegant shutdown of the Broker. , eventually triggering brokerShutdownTimeoutMs to forcibly shut down the Broker, causing the Client to fail to produce or consume messages over time.

image

When shutting down the Broker, we will first remove the temporary node of the current Broker from zookeeper, which is the logic we expect. We expect that the broker will not allocate new bundles during active shutdown of the broker. However, since the task of loadManagerExecutor has been regularly executed to report its own Load information to zookeeper, this will cause the Broker node to be shut down to be re-registered by loadManagerExecutor. At this time, the Leader Broker believes that the Broker node to be shut down can also allocate a new bundle, and will redistribute the new bundle information.

Interestingly, in the code logic of Pulsar Service shutdown, the comments clearly indicate that we expect LoadMangerTask to be closed before Broker Service, but in real code implementation, it is the opposite.

 // cancel loadShedding task and shutdown the loadManager executor before shutting down the broker

Modifications

Adjust the order of component shutdown when Broker is shut down so that LoadManagerTask is shut down before Broker Service

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@wolfstudy wolfstudy added type/bug The PR fixed a bug or issue reported a bug area/broker doc-not-needed Your PR changes do not impact docs labels Apr 17, 2023
@wolfstudy wolfstudy added this to the 3.0.0 milestone Apr 17, 2023
@wolfstudy wolfstudy self-assigned this Apr 17, 2023
@github-actions github-actions bot added doc-label-missing and removed doc-not-needed Your PR changes do not impact docs labels Apr 17, 2023
@github-actions
Copy link

@wolfstudy Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

Copy link
Contributor

@HQebupt HQebupt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@wolfstudy
Copy link
Member Author

/pulsarbot run-failure-checks

@wolfstudy wolfstudy changed the title [Improve] Fix broker restart logic [improve] [broker] Fix broker restart logic Apr 17, 2023
@wolfstudy wolfstudy added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Apr 17, 2023
@github-actions github-actions bot added doc-label-missing and removed doc-not-needed Your PR changes do not impact docs labels Apr 17, 2023
@wolfstudy wolfstudy added the doc-not-needed Your PR changes do not impact docs label Apr 17, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #20113 (5a4dff4) into master (c4aec66) will increase coverage by 39.78%.
The diff coverage is 100.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #20113       +/-   ##
=============================================
+ Coverage     33.16%   72.94%   +39.78%     
- Complexity    12225    31911    +19686     
=============================================
  Files          1499     1868      +369     
  Lines        114340   138318    +23978     
  Branches      12415    15214     +2799     
=============================================
+ Hits          37918   100897    +62979     
+ Misses        71484    29398    -42086     
- Partials       4938     8023     +3085     
Flag Coverage Δ
inttests 24.18% <0.00%> (?)
systests 24.87% <0.00%> (?)
unittests 72.23% <100.00%> (+39.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...n/java/org/apache/pulsar/broker/PulsarService.java 81.63% <100.00%> (+25.30%) ⬆️
...e/pulsar/io/elasticsearch/ElasticSearchConfig.java 90.67% <100.00%> (ø)

... and 1540 files with indirect coverage changes

@codelipenghui codelipenghui merged commit 092819b into apache:master Apr 18, 2023
codelipenghui pushed a commit that referenced this pull request Apr 18, 2023
codelipenghui pushed a commit that referenced this pull request Apr 18, 2023
codelipenghui pushed a commit that referenced this pull request Apr 18, 2023
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request May 11, 2023
(cherry picked from commit 092819b)
(cherry picked from commit 9e47fa6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants