Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed on connection exception #1

Open
SudhanshuBlaze opened this issue Feb 26, 2024 · 6 comments
Open

Failed on connection exception #1

SudhanshuBlaze opened this issue Feb 26, 2024 · 6 comments

Comments

@SudhanshuBlaze
Copy link

Hi

I was able to follow your instructions, but I am getting this error on this step: hdfs dfs -copyFromLocal ./test.txt /test.txt

root@bc548e4d1633:/home/big_data# echo "test" > test.txt
root@bc548e4d1633:/home/big_data# hdfs dfs -copyFromLocal ./test.txt /test.txt
copyFromLocal: Call From bc548e4d1633/10.0.1.2 to master-node:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
root@bc548e4d1633:/home/big_data# hdfs dfs -ls /
ls: Call From bc548e4d1633/10.0.1.2 to master-node:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
root@bc548e4d1633:/home/big_data# 

Command used to run the container:

docker container run --rm -v hdfs_master_data_swarm:/home/hadoop/data/nameNode jwaresolutions/big-data-cluster: /usr/local/hadoop/bin/hadoop namenode -format

Output from master node:

root@bc548e4d1633:/home/big_data# jps
4145 Jps
466 DataNode
1431 ResourceManager
827 SecondaryNameNode
2463 Master
1839 NodeManager
root@bc548e4d1633:/home/big_data# echo "test" > test.txt
root@bc548e4d1633:/home/big_data# hdfs dfs -copyFromLocal ./test.txt /test.txt
copyFromLocal: Call From bc548e4d1633/10.0.1.2 to master-node:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
root@bc548e4d1633:/home/big_data# hdfs dfs -ls /
ls: Call From bc548e4d1633/10.0.1.2 to master-node:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
root@bc548e4d1633:/home/big_data# 

Output from my worker node:

$ docker exec -it d3c6ef08dc84 bash
root@d3c6ef08dc84:/home/big_data# jps
1410 Jps
946 NodeManager
214 DataNode
443 SecondaryNameNode
1245 Worker
root@d3c6ef08dc84:/home/big_data# hdfs dfs -ls /
ls: Call From d3c6ef08dc84/10.0.1.8 to master-node:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
root@d3c6ef08dc84:/home/big_data# ssh master
ssh: Could not resolve hostname master: Name or service not known
root@d3c6ef08dc84:/home/big_data# 

Other info which might be helpful for you:

kumarsu@airii:~/Documents/GitHub/docker-big-data-cluster$ docker node ls
ID                            HOSTNAME     STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
nonoljtj9e5w0soxitv572ndv *   airii        Ready     Active         Leader           25.0.3
oqgghu8xpcg18w6r2uxxs0eaw     powerhorse   Ready     Active                          25.0.2
kumarsu@airii:~/Documents/GitHub/docker-big-data-cluster$ 
@Genarito
Copy link
Member

Hi @SudhanshuBlaze It's a known issue in Spark, at least in the version this repo is using (I'll update it in the future). Have you tried the steps listed in the FAQ?

Please, try running inside the master node the following commands:

  1. stop-all.sh
  2. hadoop namenode -format
  3. start-all.sh

Let me know if that worked for you 😄

@SudhanshuBlaze
Copy link
Author

SudhanshuBlaze commented Feb 27, 2024

Hi,

Yes it worked for me, I had to combine your solution with this answer I found on StackOverflow which was to update my conf/core-site.xml:

<configuration>
  <property>
 <name>fs.default.name</name>
 <value>hdfs://0.0.0.0:9000</value>
</property

@Genarito
Copy link
Member

Thanks! I'll add it to the default conf!

@SudhanshuBlaze
Copy link
Author

Hi @Genarito

I just realized I am facing another issue.

When I go to port 9870 to see the HDFS WebUI, in the Live Nodes section it shows 0 Live Nodes, however I have 3 docker containers running as worker nodes. What could be the possible solution to this problem? I have attached the screenshot for your reference.

My OS:

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

Screenshot from 2024-02-27 17-34-42
Screenshot from 2024-02-27 17-35-45
Screenshot from 2024-02-27 17-36-17

@Genarito
Copy link
Member

Hi @SudhanshuBlaze,
Thanks for the report! I've had that problem too. However, don't worry, Spark will consider the 3 workers you have to delegate tasks, I guess it's just a Hadoop-specific configuration issue.

I'm finishing my PhD thesis, and when I do I'll get around to updating the technologies in this repository and improving the configuration to reduce the problems mentioned.

I will leave this issue open to follow up on the problem in the future.
Thanks and best regards!

@SudhanshuBlaze
Copy link
Author

My recommendation would be to create subnet when creating docker network in swarm

docker network create -d overlay --subnet={subnet range} --attachable {network name}

Also while running the container add hosts:

docker run --rm --name hadoop-master --net hadoop-overlay --ip 10.0.9.22 --hostname hadoop-master -it --add-host hadoop-slave-1:10.0.9.23 --add-host hadoop-slave-2:10.0.9.24 -p 50090:50090 -p 8088:8088 -p 50070:50070 hadoop-master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants