unfollow call not returning when remote cluster is not available

**Elasticsearch version** (`bin/elasticsearch --version`): 

Version: 7.6.1, Build: default/rpm/aa751e09be0a5072e8570670309b1f12348f023b/2020-02-29T00:15:25.529771Z, JVM: 13.0.2

**Plugins installed**: []

**JVM version** (`java -version`):

openjdk version "13.0.2" 2020-01-14
OpenJDK Runtime Environment AdoptOpenJDK (build 13.0.2+8)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 13.0.2+8, mixed mode, sharing)

**OS version** (`uname -a` if on a Unix-like system):

Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

**Description of the problem including expected versus actual behavior**:
Testing a disaster recovery scenario with ccr I found two different behaviours  unfollowing indices when the leader is not available (remote cluster is down)

1) The unfollow call returns immediately with connect_transport_exception exceptions 

2) The unfollow call never returns (i waited for more than an hour)
The cluster shows the unfollow task running but never dies.
This second case seems to happens with indices with number_shards > 1

Expected behavior: both  cases returning with status.

**Steps to reproduce**:
To reproduce I did a fresh install of 2 clusters with 3 nodes each with latest version of ES (rpm)

elasticsearch.yml :

```
cluster.name: primary
node.name: node1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: _eth0_
http.port: 9200
discovery.seed_hosts: ["192.168.1.219", "192.168.1.222", "192.168.1.223"]
cluster.initial_master_nodes: ["node1", "node2", "node3"]
gateway.recover_after_nodes: 2
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12

```
Steps:

#Start trial on both cluster
```
curl -X POST "192.168.1.219:9200/_license/start_trial?acknowledge=true&pretty" -u elastic:badpassword
curl -X POST "192.168.1.226:9200/_license/start_trial?acknowledge=true&pretty" -u elastic:badpassword

```
#remote cluster connection

```
curl -u elastic:badpassword -X PUT "192.168.1.226:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster": {
      "remote": {
        "remote-primary": {
          "seeds": [
            "192.168.1.219:9300"
          ],
          "transport.ping_schedule": "30s"
        }
        }
      }
    }
  }
'

```
#Verify remote 
```
curl -XGET  'http://192.168.1.226:9200/_remote/info' -u elastic:badpassword 
```
#Create leader index with 1 shard on primary cluster
```
curl -u elastic:badpassword -X PUT "192.168.1.219:9200/testing-11?pretty" -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "index" : {
            "number_of_shards" : 1, 
            "number_of_replicas" : 1 
        }
    }
}
'
```
#Create leader index with 2 shards on primary cluster
```
curl -u elastic:badpassword -X PUT "192.168.1.219:9200/testing-21?pretty" -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "index" : {
            "number_of_shards" : 2, 
            "number_of_replicas" : 1 
        }
    }
}
'

```

#Verify new indices

`curl -XGET  'http://192.168.1.219:9200/_cat/indices/test*?v' -u elastic:badpassword
`

#Create followers on secondary cluster

```
curl -u elastic:badpassword -X PUT "192.168.1.226:9200/testing-11/_ccr/follow?wait_for_active_shards=1&pretty" -H 'Content-Type: application/json' -d'
{
  "remote_cluster" : "remote-primary",
  "leader_index" : "testing-11"
}
'


curl -u elastic:badpassword -X PUT "192.168.1.226:9200/testing-21/_ccr/follow?wait_for_active_shards=1&pretty" -H 'Content-Type: application/json' -d'
{
  "remote_cluster" : "remote-primary",
  "leader_index" : "testing-21"
}
'
```
#Verify followers 

```
curl -XGET  'http://192.168.1.226:9200/_cat/indices/test*?v' -u elastic:badpassword
curl -XGET  'http://192.168.1.226:9200/_ccr/stats?pretty' -u elastic:badpassword
```
#Shutdown primary cluster nodes
```
systemctl stop elasticsearch
```
#Verify ccr status 
```
curl -XGET  'http://192.168.1.226:9200/_remote/info' -u elastic:badpassword
curl -XGET  'http://192.168.1.226:9200/_ccr/stats?pretty&filter_path=follow_stats.indices.shards.read_exceptions.*' -u elastic:badpassword
```
#Pause following on both indices and verify
```
curl -u elastic:badpassword -X POST "192.168.1.226:9200/testing-11/_ccr/pause_follow?pretty"
curl -u elastic:badpassword -X POST "192.168.1.226:9200/testing-21/_ccr/pause_follow?pretty"
curl -XGET  'http://192.168.1.226:9200/testing-*/_ccr/info?pretty' -u elastic:badpassword
```
#Close
```
curl -u elastic:badpassword -X POST "192.168.1.226:9200/testing-11/_close?pretty"
curl -u elastic:badpassword -X POST "192.168.1.226:9200/testing-21/_close?pretty"
curl -XGET  'http://192.168.1.226:9200/_cat/indices/test*?v' -u elastic:badpassword
```
#unfollow

#this call to unfollow return a  connect_transport_exception
```
curl -u elastic:badpassword -X POST "192.168.1.226:9200/testing-11/_ccr/unfollow?pretty"
```
#this call to unfollow never returns
```
curl -u elastic:badpassword -X POST "192.168.1.226:9200/testing-21/_ccr/unfollow?pretty"
```
#Verify long running tasks
```
curl -XGET  'http://192.168.1.226:9200/_cat/tasks?v' -u elastic:badpassword
```
```
action                           task_id                      parent_task_id              type      start_time    timestamp running_time ip            node
indices:admin/xpack/ccr/unfollow V-frNb4_Sc2RTGZ_uYne-g:665   -                           transport 1583414050796 13:14:10  2h           192.168.1.226 node3s
indices:admin/xpack/ccr/unfollow dm-h5Zx7TxCUoOvn1I_DWQ:1208  V-frNb4_Sc2RTGZ_uYne-g:665  transport 1583414050798 13:14:10  2h           192.168.1.224 node2s
cluster:monitor/tasks/lists      V-frNb4_Sc2RTGZ_uYne-g:8609  -                           transport 1583421254522 15:14:14  756.7micros  192.168.1.226 node3s
cluster:monitor/tasks/lists[n]   V-frNb4_Sc2RTGZ_uYne-g:8610  V-frNb4_Sc2RTGZ_uYne-g:8609 direct    1583421254523 15:14:14  241.5micros  192.168.1.226 node3s
cluster:monitor/tasks/lists[n]   dm-h5Zx7TxCUoOvn1I_DWQ:16577 V-frNb4_Sc2RTGZ_uYne-g:8609 transport 1583421254530 15:14:14  247.2micros  192.168.1.224 node2s
cluster:monitor/tasks/lists[n]   YIzbKYCoQ-SDyJTkAVSBQg:8917  V-frNb4_Sc2RTGZ_uYne-g:8609 transport 1583421254533 15:14:14  343.3micros  192.168.1.225 node1s
```




**Provide logs (if relevant)**:

[secondary-v761-node1s.log](https://github.com/elastic/elasticsearch/files/4293535/secondary-v761-node1s.log)

[secondary-v761-node2s.log](https://github.com/elastic/elasticsearch/files/4293523/secondary-v761-node2s.log)

[secondary-v761-node3s.log](https://github.com/elastic/elasticsearch/files/4293538/secondary-v761-node3s.log)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

unfollow call not returning when remote cluster is not available #53174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unfollow call not returning when remote cluster is not available #53174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions