-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Producers failed to open when leader broker shut down #7041
Labels
Comments
sijie
added
area/broker
triage/week-22
type/bug
The PR fixed a bug or issue reported a bug
labels
May 27, 2020
codelipenghui
pushed a commit
that referenced
this issue
May 31, 2020
Master Issue: #7041 ### Motivation When a leader broker is restarted, some producers for topics owned by that broker may not be reopened on the new broker. When this happens, message publishing will continue to fail until the client application is restarted. As a result of the investigation, I found that lookup requests sent by the producers in question are redirected more than 10,000 times between multiple brokers. When a lookup request is redirected, `BinaryProtoLookupService#findBroker()` is called recursively. Therefore, tens of thousands of redirects will cause `StackOverflowError` and `BinaryProtoLookupService#findBroker()` will never complete. ### Modifications Limit the number of times a lookup is redirected to 100. This maximum is user configurable. If the number of redirects exceeds 100, the lookup will fail. But `ConnectionHandler` retries lookup so that the producer can eventually reconnect to the new broker.
Huanli-Meng
pushed a commit
to Huanli-Meng/pulsar
that referenced
this issue
Jun 1, 2020
Master Issue: apache#7041 ### Motivation When a leader broker is restarted, some producers for topics owned by that broker may not be reopened on the new broker. When this happens, message publishing will continue to fail until the client application is restarted. As a result of the investigation, I found that lookup requests sent by the producers in question are redirected more than 10,000 times between multiple brokers. When a lookup request is redirected, `BinaryProtoLookupService#findBroker()` is called recursively. Therefore, tens of thousands of redirects will cause `StackOverflowError` and `BinaryProtoLookupService#findBroker()` will never complete. ### Modifications Limit the number of times a lookup is redirected to 100. This maximum is user configurable. If the number of redirects exceeds 100, the lookup will fail. But `ConnectionHandler` retries lookup so that the producer can eventually reconnect to the new broker.
Huanli-Meng
pushed a commit
to Huanli-Meng/pulsar
that referenced
this issue
Jun 1, 2020
Master Issue: apache#7041 ### Motivation When a leader broker is restarted, some producers for topics owned by that broker may not be reopened on the new broker. When this happens, message publishing will continue to fail until the client application is restarted. As a result of the investigation, I found that lookup requests sent by the producers in question are redirected more than 10,000 times between multiple brokers. When a lookup request is redirected, `BinaryProtoLookupService#findBroker()` is called recursively. Therefore, tens of thousands of redirects will cause `StackOverflowError` and `BinaryProtoLookupService#findBroker()` will never complete. ### Modifications Limit the number of times a lookup is redirected to 100. This maximum is user configurable. If the number of redirects exceeds 100, the lookup will fail. But `ConnectionHandler` retries lookup so that the producer can eventually reconnect to the new broker.
Huanli-Meng
pushed a commit
to Huanli-Meng/pulsar
that referenced
this issue
Jun 12, 2020
Master Issue: apache#7041 ### Motivation When a leader broker is restarted, some producers for topics owned by that broker may not be reopened on the new broker. When this happens, message publishing will continue to fail until the client application is restarted. As a result of the investigation, I found that lookup requests sent by the producers in question are redirected more than 10,000 times between multiple brokers. When a lookup request is redirected, `BinaryProtoLookupService#findBroker()` is called recursively. Therefore, tens of thousands of redirects will cause `StackOverflowError` and `BinaryProtoLookupService#findBroker()` will never complete. ### Modifications Limit the number of times a lookup is redirected to 100. This maximum is user configurable. If the number of redirects exceeds 100, the lookup will fail. But `ConnectionHandler` retries lookup so that the producer can eventually reconnect to the new broker.
huangdx0726
pushed a commit
to huangdx0726/pulsar
that referenced
this issue
Aug 24, 2020
Master Issue: apache#7041 ### Motivation When a leader broker is restarted, some producers for topics owned by that broker may not be reopened on the new broker. When this happens, message publishing will continue to fail until the client application is restarted. As a result of the investigation, I found that lookup requests sent by the producers in question are redirected more than 10,000 times between multiple brokers. When a lookup request is redirected, `BinaryProtoLookupService#findBroker()` is called recursively. Therefore, tens of thousands of redirects will cause `StackOverflowError` and `BinaryProtoLookupService#findBroker()` will never complete. ### Modifications Limit the number of times a lookup is redirected to 100. This maximum is user configurable. If the number of redirects exceeds 100, the lookup will fail. But `ConnectionHandler` retries lookup so that the producer can eventually reconnect to the new broker.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When a leader broker shut down, producers that connected with the broker failed to open on a new broker.
According to the leader broker log, the broker unloaded bundles and closed producers.
Also, according to logs of other brokers, another broker became leader broker and bundles were loaded.
However, producers that connected with the old leader broker reconnected with a new broker but some producers of ones failed to open on the new broker.
After the producers reconnected with the new broker, they didn't send
CommandProducer
messages and stop.pulsar/pulsar-common/src/main/proto/PulsarApi.proto
Lines 406 to 430 in d55bc00
Expected behavior
When producers reconnect with a new broker, open on the broker.
Actual behavior
Some producers failed to open on the new broker.
Steps to reproduce
We tried but haven't reproduced yet.
System configuration
OS(Broker): CentOS 7.7
Pulsar Broker: 2.3.2
Pulsar Client Java: 2.3.2
The text was updated successfully, but these errors were encountered: