Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][Docker Swarm Multi Node Cluster connectivity issue] #224

Open
hamzaismaeel15 opened this issue Feb 13, 2025 · 3 comments
Open

[BUG][Docker Swarm Multi Node Cluster connectivity issue] #224

hamzaismaeel15 opened this issue Feb 13, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@hamzaismaeel15
Copy link

hamzaismaeel15 commented Feb 13, 2025

Description:

I am facing an issue that my OpenSearch cluster won't connect, I am running 3 virtual machines and created their docker swarm cluster, I want to create one container on each virtual machine and create cluster.

To Reproduce:

Steps to reproduce the behavior:

  • All you need is to copy the docker-compose file and create docker swarm 3 vm cluster and make sure to add labels.
  1. Copy the docker compose and create docker-compose.yml file
  2. use command to run, "docker stack deploy -c 'file-name' 'cluster-name' e.g docker stack deploy -c docker-compose.yml opensearch

docker-compose.yml

version: '3'
services:
opensearch-node1:
image: opensearchproject/opensearch:2.18.0
#container_name: opensearch-node1
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3
- cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2,opensearch-node3
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
- "DISABLE_INSTALL_DEMO_CONFIG=true"
- "DISABLE_SECURITY_PLUGIN=true"
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=Hamza@31017
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- /opt/os/data1:/usr/share/opensearch/data
ports:
- 9200:9200
- 9600:9600
- 9300:9300
deploy:
placement:
constraints:
- "node.labels.db == ubuntu"
networks:
- opensearch-net

opensearch-node2:
image: opensearchproject/opensearch:2.18.0
#container_name: opensearch-node2
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3
- cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2,opensearch-node3
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
- "DISABLE_INSTALL_DEMO_CONFIG=true"
- "DISABLE_SECURITY_PLUGIN=true"
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=Hamza@31017
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- /opt/os/data2:/usr/share/opensearch/data
deploy:
placement:
constraints:
- "node.labels.db == node1"
networks:
- opensearch-net

opensearch-node3:
image: opensearchproject/opensearch:2.18.0
#container_name: opensearch-node3
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3
- cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2,opensearch-node3
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
- "DISABLE_INSTALL_DEMO_CONFIG=true"
- "DISABLE_SECURITY_PLUGIN=true"
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=Hamza@31017
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- /opt/os/data3:/usr/share/opensearch/data
deploy:
placement:
constraints:
- "node.labels.db == node2"
networks:
- opensearch-net

opensearch-dashboards:
image: opensearchproject/opensearch-dashboards:2.18.0
container_name: opensearch-dashboards
ports:
- 5601:5601
expose:
- "5601"
environment:
- 'OPENSEARCH_HOSTS=["http://opensearch-node1:9200","http://opensearch-node2:9200","http://opensearch-node3:9200"]'
- "DISABLE_SECURITY_DASHBOARDS_PLUGIN=true"
networks:
- opensearch-net

networks:
opensearch-net:

ISSUE:

[WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-node1] cluster-manager not discovered or elected yet, an election requires at least 2 nodes with ids from [hTfIxK_mST-qJgwiGS0w6w, N8R2kscGT1GAbK-L-mj_qQ, Yesi_vQZQyC-iCwDRkrjuw], have discovered [{opensearch-node1}{hTfIxK_mST-qJgwiGS0w6w}{SCJ76MhQQT6Q2ql-KMPpBg}{10.0.0.80}{10.0.0.80:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-node2}{4J085Ma_T3qvuPUshCdLBA}{TodgiY4lQ6KY5RF8pZblNQ}{10.0.1.17}{10.0.1.17:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-node2}{MrJ4dBqdQO2QlHp0RDulPA}{qCoQee04QuaSsO-J8Wc8mQ}{10.0.1.18}{10.0.1.18:9300}{dimr}{shard_indexing_pressure_enabled=true}] which is not a quorum; discovery will continue using [10.0.1.5:9300, 10.0.1.7:9300, 10.0.1.10:9300] from hosts providers and [{opensearch-node1}{hTfIxK_mST-qJgwiGS0w6w}{SCJ76MhQQT6Q2ql-KMPpBg}{10.0.0.80}{10.0.0.80:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 3, last-accepted version 89 in term 3

Expected behavior:

It should run the cluster as it runs with docker-compose on single vm, if i run this on single vm by removing the placement constraints from the docker-compose.yml it runs fine on only one vm. I need to work with docker swarm by having one container on each node and create cluster.

Host/Environment:

  • OS: Linux - Ubuntu
  • Version 20.04
@hamzaismaeel15 hamzaismaeel15 added bug Something isn't working untriaged Issues that have not yet been triaged labels Feb 13, 2025
@hamzaismaeel15 hamzaismaeel15 changed the title [BUG][Docker Swarm Multi Node Cluster] [BUG][Docker Swarm Multi Node Cluster connectivity issue] Feb 13, 2025
@DandyDeveloper
Copy link
Collaborator

@hamzaismaeel15 That looks more like you only have a single node running in the swarm? Are you sure all the containers are running correctly?

@DandyDeveloper DandyDeveloper removed the untriaged Issues that have not yet been triaged label Feb 17, 2025
@hamzaismaeel15
Copy link
Author

@DandyDeveloper All containers are deployed on different Virtual machines as you can see the placement constraints defined in the file, which deploys each container on dedicated hostname virtual machine, if it is possible you can join me via call to see the setup. Thank you

@DandyDeveloper
Copy link
Collaborator

@hamzaismaeel15 Sorry, I'm unable to join a call, but if possible, can you go into the container and verify that your CRI / CNI has appropriately configure the hostnames and the other nodes are resolvable in the network?

The implication is certainly that the swarm side of things is either misconfigured, or the hosts are, but we'll need a lot more info to dive into it.

Exec into a container, try polling the other nodes to establish whether the networking is setup correctly or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants