-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix the issue of topics possibly being deleted. #21704
Conversation
@@ -1632,6 +1631,11 @@ public CompletableFuture<Void> checkReplication() { | |||
} | |||
|
|||
List<String> configuredClusters = topicPolicies.getReplicationClusters().get(); | |||
if (CollectionUtils.isEmpty(configuredClusters)) { | |||
log.warn("[{}] No replication clusters configured", name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be an abnormal situation, please add some explanation here to indicate what happens here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referring to the above flow diagram, method onUpdate()->checkReplication()
is called before method this.updateTopicPolicyByNamespacePolicy(policies)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe we can refine the log to let other users know about the possibility of a problem. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Is it possible to add a test? Maybe running it many times can reproduce it, which is also acceptable. :)
Its difficult to reproduce with unit tests, manually |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #21704 +/- ##
=============================================
+ Coverage 36.75% 73.37% +36.62%
- Complexity 12271 32767 +20496
=============================================
Files 1717 1893 +176
Lines 131197 140761 +9564
Branches 14339 15509 +1170
=============================================
+ Hits 48220 103290 +55070
+ Misses 76596 29360 -47236
- Partials 6381 8111 +1730
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@crossoverJie, can we just put a time-consuming task in the thread pool to simulate the It's a very critical fix, if don't have a test to protect the change, things will go wrong again in the future. Here is an example: #21686 |
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Jiwe Guo <[email protected]> (cherry picked from commit 84ea1ca)
Co-authored-by: Jiwe Guo <[email protected]> (cherry picked from commit 84ea1ca)
Co-authored-by: Jiwe Guo <[email protected]> (cherry picked from commit 84ea1ca)
Co-authored-by: Jiwe Guo <[email protected]> (cherry picked from commit 84ea1ca)
…#21704) Co-authored-by: Jiwe Guo <[email protected]> (cherry picked from commit 84ea1ca) (cherry picked from commit 49490b3)
Sorry, to clear the root cause of the issue, I left a comment: The
|
time | task: topic loading |
task: modifying topic policies |
---|---|---|
1 | Registered the topic to accept new changes of Topic Policies; the attribute replicationClusters is empty now |
|
2 | Modify topic policies | |
3 | Trigger a task checkReplication , and then it will try to delete the topic due to replicationClusters being empty |
|
4 | Initialize the attribute replicationClusters from Namespace level Replication settings |
Detail steps of the issue: conditions-2
time | task: topic loading |
task: TopicPoliciesService initialize after brokers restart |
---|---|---|
1 | Registered the topic to accept new changes of Topic Policies; the attribute replicationClusters is empty now |
|
2 | Read all topic policies and try to update topic-level policies for every topic under this broker after the Broker restart | |
3 | Trigger a task checkReplication , and then it will try to delete the topic due to replicationClusters being empty |
|
4 | Initialize the attribute replicationClusters from Namespace level Replication settings |
When there is more than one broker under your cluster, the probability of shedding topics to a new broker(when it has just started right now) is low, so the probability is high if there is only 1
broker.
Workaround-1
Before you upgrade to the new version(which contains this fix), disable the feature topicLevelPoliciesEnabled
, but it also makes all topic-level policies invalidate you set before.
Workaround-2
Before you upgrade to the new version(which contains this fix), do not restart brokers or modify any topic-level policies.
@poorbarcode But based on the background of |
Ah, I see. Pulsar reads all topic policies and calls @crossoverJie I updated the comment above, please review it, thanks |
@poorbarcode Great job, thanks for your supplement👍. |
…#21704) Co-authored-by: Jiwe Guo <[email protected]> (cherry picked from commit 84ea1ca) (cherry picked from commit 49490b3)
…#21704) Co-authored-by: Jiwe Guo <[email protected]> (cherry picked from commit 84ea1ca)
Fixes #21653
Motivation
Reproduce process:
#21653 (comment)
Modifications
The
registerTopicPolicyListener()
function was moved fromPersistentTopic()
toinitTopicPolicy()
.Flow diagram:
Current:

After:

Verifying this change
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: crossoverJie#19