-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] [client] Messages lost when consumer reconnect #20695
[fix] [client] Messages lost when consumer reconnect #20695
Conversation
// responsible for reconnection. And the variable "duringConnect" will prevent the concurrent execution. | ||
if (getState() == State.Ready) { | ||
return CompletableFuture.completedFuture(null); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By duringConnect
, did you mean duringSeek
in the current method at line 785? I am assuming this because I couldn't find the word duringConnect
within the method or the class itself.
If duringConnect
is outside, maybe providing JavaDoc with {@link }
tag will help readers to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A clear patch! LGTM.
pulsar-broker/src/test/java/org/apache/pulsar/client/api/SimpleProducerConsumerTest.java
Outdated
Show resolved
Hide resolved
Good catch! |
(cherry picked from commit 09c89cd)
Reopen #20591
Motivation
Background of consumer reconnects
CMD-subscribe
to the brokerflow permits
to broker to incrementavailablePermits
Background of scenarios that could trigger reconnection:
cmd-close_consumer
, such asunload topic
,reset clusters
, and so on.Background of the response of broker received subscribe request
The broker only response
success
if it receives a secondsubscribe
request of the same consumerhttps://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1197
I wanted to prevent this by adding a validation below to prevent the messages which in memory is cleared if the method
grabCnx
execute after the subscribe is finished, which result in messages being lost.But I notice that there is a check
consumer.cnx == null
in the methodgrabCnx
, because if thestate
of the consumer isReady
, the variablecnx
of the consumer must not be null, so this test can be effective as the checkconsumer.state != Ready
.And I notice another issue below:
Issue-1
If the method
grab connection
is executed multi times, it will lose some messages due to a race condition, for example:grab connection 1
grab connection 2
cnx
of the consumer is nullcnx
of the consumer is nullduringConnect
totrue
consumer.connectionOpened()
consumer.cnx
availablePermits
duringConnect
tofalse
duringConnect
totrue
consumer.connectionOpened()
consumer.cnx
availablePermits
duringConnect
tofalse
We should make the check
consumer.cnx == null
execute after the checkcompare and set duringConnect.
Issue-2
After we fixed the
issue-1
, the checkconsumer.cnx == null
and the checkduringConnect == false
switch the order, the new issue occurs:Since we use the variable
duringConnectto prevent multi
grabCnxrunning at the same time, we should make
set consumer.cnx to nullbeing executed before
set duringConnect to true` when the subscribe request fails. This can avoid the issue below:reconnect later
grab connection
duringConnect
tofalse
duringConnect
totrue
cnx
of the consumer is nullset consumer.cnx to null
Modifications
set consumer.cnx to null
being executed beforeset duringConnect to true
when the subscribe request failsDocumentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: x