Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 8093]Fix client lookup hangs when broker restarts #8101

Merged
merged 4 commits into from
Sep 24, 2020
Merged

[Issue 8093]Fix client lookup hangs when broker restarts #8101

merged 4 commits into from
Sep 24, 2020

Conversation

4onni
Copy link
Contributor

@4onni 4onni commented Sep 22, 2020

Fixes #8093

Motivation

Client hangs forever when all brokers stop and then restart.
There are several steps need to be finished before the broker can be fully started, as illustrated in the pseudo code below:

PulsarService#start():
    broker.start(); // Step 1
    webService.start(); // Step 2
    leaderElectionService.start(); //Step 3

If a lookup request gets in between Step 2 and Step 3, a NPE would be thrown, which will block all other coming requests from getting processed properly.

Modifications

Client can only connect to the broker after the election service started successfully

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

    • Added 2 test cases under LeaderElectionServiceTest

@codelipenghui codelipenghui added the type/bug The PR fixed a bug or issue reported a bug label Sep 22, 2020
@codelipenghui codelipenghui added this to the 2.7.0 milestone Sep 22, 2020
@@ -400,6 +400,11 @@ public boolean registerNamespace(String namespace, boolean ensureOwned) throws P
private void searchForCandidateBroker(NamespaceBundle bundle,
CompletableFuture<Optional<LookupResult>> lookupFuture,
LookupOptions options) {
if( null == pulsar.getLeaderElectionService() || ! pulsar.getLeaderElectionService().isElected()) {
LOG.warn("The leader election has not yet been completed! NamespaceBundle[{}]", bundle);
lookupFuture.completeExceptionally(new IllegalStateException("The leader election has not yet been completed!"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great contribution, I think it's better to return a retryable exception to the client? So that the client can reconnect later. Does this make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @4onni, this will resultin a client lookup timeout, and the client will relookup later, so it's not a problem here.

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

4 similar comments
@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

@jiazhai
Copy link
Member

jiazhai commented Sep 22, 2020

/pulsarbot run-failure-checks

@jiazhai
Copy link
Member

jiazhai commented Sep 23, 2020

/pulsarbot run-failure-checks

@jiazhai
Copy link
Member

jiazhai commented Sep 23, 2020

/pulsarbot run-failure-checks

4 similar comments
@jiazhai
Copy link
Member

jiazhai commented Sep 23, 2020

/pulsarbot run-failure-checks

@jiazhai
Copy link
Member

jiazhai commented Sep 23, 2020

/pulsarbot run-failure-checks

@jiazhai
Copy link
Member

jiazhai commented Sep 23, 2020

/pulsarbot run-failure-checks

@jiazhai
Copy link
Member

jiazhai commented Sep 23, 2020

/pulsarbot run-failure-checks

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

@sijie sijie merged commit 65cf9c0 into apache:master Sep 24, 2020
jiazhai pushed a commit to jiazhai/pulsar that referenced this pull request Sep 24, 2020
Fixes apache#8093

### Motivation

Client hangs forever when all brokers stop and then restart.
There are several steps need to be finished before the broker can be fully started, as illustrated in the pseudo code below:

```
PulsarService#start():
    broker.start(); // Step 1
    webService.start(); // Step 2
    leaderElectionService.start(); //Step 3
```
If a lookup request gets in between Step 2 and Step 3, a NPE would be thrown, which will block all other coming requests from getting processed properly.

### Modifications

Client can only connect to the broker after the election service started successfully

### Verifying this change

This change added tests and can be verified as follows:
 - * Added 2 test cases under `LeaderElectionServiceTest`
(cherry picked from commit 65cf9c0)
jiazhai pushed a commit to jiazhai/pulsar that referenced this pull request Sep 24, 2020
Fixes apache#8093

Client hangs forever when all brokers stop and then restart.
There are several steps need to be finished before the broker can be fully started, as illustrated in the pseudo code below:

```
PulsarService#start():
    broker.start(); // Step 1
    webService.start(); // Step 2
    leaderElectionService.start(); //Step 3
```
If a lookup request gets in between Step 2 and Step 3, a NPE would be thrown, which will block all other coming requests from getting processed properly.

Client can only connect to the broker after the election service started successfully

This change added tests and can be verified as follows:
 - * Added 2 test cases under `LeaderElectionServiceTest`
(cherry picked from commit 65cf9c0)
lbenc135 pushed a commit to lbenc135/pulsar that referenced this pull request Oct 3, 2020
Fixes apache#8093

### Motivation

Client hangs forever when all brokers stop and then restart.
There are several steps need to be finished before the broker can be fully started, as illustrated in the pseudo code below:

```
PulsarService#start():
    broker.start(); // Step 1
    webService.start(); // Step 2
    leaderElectionService.start(); //Step 3
```
If a lookup request gets in between Step 2 and Step 3, a NPE would be thrown, which will block all other coming requests from getting processed properly.

### Modifications

Client can only connect to the broker after the election service started successfully

### Verifying this change

This change added tests and can be verified as follows:
 - * Added 2 test cases under `LeaderElectionServiceTest`
@jiazhai
Copy link
Member

jiazhai commented Oct 16, 2020

already merged into branch-2.6 for 2.6.2 release
a4a363cb3eeff7f76466fe078efe6fb7c793d136 

wolfstudy added a commit that referenced this pull request Oct 30, 2020
wolfstudy pushed a commit that referenced this pull request Oct 30, 2020
Fixes #8093

### Motivation

Client hangs forever when all brokers stop and then restart.
There are several steps need to be finished before the broker can be fully started, as illustrated in the pseudo code below:

```
PulsarService#start():
    broker.start(); // Step 1
    webService.start(); // Step 2
    leaderElectionService.start(); //Step 3
```
If a lookup request gets in between Step 2 and Step 3, a NPE would be thrown, which will block all other coming requests from getting processed properly.

### Modifications

Client can only connect to the broker after the election service started successfully

### Verifying this change

This change added tests and can be verified as follows:
 - * Added 2 test cases under `LeaderElectionServiceTest`

(cherry picked from commit 65cf9c0)
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Nov 13, 2020
Fixes apache#8093

### Motivation

Client hangs forever when all brokers stop and then restart.
There are several steps need to be finished before the broker can be fully started, as illustrated in the pseudo code below:

```
PulsarService#start():
    broker.start(); // Step 1
    webService.start(); // Step 2
    leaderElectionService.start(); //Step 3
```
If a lookup request gets in between Step 2 and Step 3, a NPE would be thrown, which will block all other coming requests from getting processed properly.

### Modifications

Client can only connect to the broker after the election service started successfully

### Verifying this change

This change added tests and can be verified as follows:
 - * Added 2 test cases under `LeaderElectionServiceTest`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/2.5.3 release/2.6.2 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Client hangs forever when broker restarts
4 participants