-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solution For Connectivity Problem When Submitting Work From Master Node To Worker(s) #8
Comments
@Thelin90 thank you so much, your 2 nights save my first night :D |
@Thelin90 When I config as your, I can submit task from inside master pod but still have "Initial job has not accepted any resources" issue when submit task from other client pods. |
@leehuwuj it might be that you are not adding the kubectl get pods -o wide Go inside the pod as I describe above: kubectl exec -it <pod> -n <namespace> -- /bin/bash Take the pyspark --conf spark.driver.bindAddress=<MASTER-POD-IP> --conf spark.driver.host=<MASTER-POD-IP> And then run: sc.parallelize([1,2,3,4,5,6]).collect() That should work. Alternatively, I recommend you fork my repository: https://github.com/Thelin90/deiteo And try to run it with my instructions there, you might find if you done something different from me there. I have automated the whole process with |
@leehuwuj also are you running on Ubuntu or Mac? |
@Thelin90 I'm running on Mac. Run Submit command inside master node is OK but I can not run submit task from other pod. |
@leehuwuj I think you will need to be more specific with your issue, similar to what I have done here, I have described my problem in very great detail, and how i solved it, if you have a new problem you must try to put down step by step, what you are doing, and what is going wrong. I can't guess based on what you have written unfortunately. But, I would recommend you to have a look in my own repo and try that, if that works for you, you might find something you are not doing right in your own code. |
Please stick to the context of this issue, your question has nothing to do with what is being discussed here. |
spark-shell is fine from inside pod but spark submit for PI example is not succesfull causing some websocket closed connection issue ` spark-submit --name sparkpi-1 \
|
This also worked.
|
I was inspired by this repository, and continue to build on it.
However, I also got the issue faced here: #1
I was getting:
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I bashed my head around this for 2 nights, not being an expert in
K8S
I first thought something was wrong with how I started it up.Either way, this is how I reproduced the problem:
1)
I checked my resources, and I made the following config:
spark-defaults.conf
2)
And I ran
minikube
with:3)
These were my
spark-master
andspark-worker
scripts:spark-worker.sh
### Note I put
2g
here just to be 100% confident I was not using to much resources.spark-worker.sh
4)
I then ran:
And error occurred!
I made sure to get access to
8081
and4040
to investigate logs further:I then went in and:
5)
I scratched my head, and I knew! I have enough resources, why does this not work!
And I could see:
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: sparkmaster/10.101.97.213:41607 Caused by: java.net.ConnectException: Connection timed out
I then thought well, I done this right:
The docs mention that it can be either
HOST
orIP
, I am good I thought. I saw the possible solution of:Well this was not a problem for me, actually I had no
iptables
to resolve at all.So I then verified the
master
IP
with:I then took the
MASTER-IP
and added it directly:6)
SOLUTION:
spark-defaults.conf
And add the
IPs
correctly:spark-worker.sh
In this case my
SPARK_HOME
is/usr/local/spark
My
Dockerfile
Currently bulding a streaming platform in this repo:
https://github.com/Thelin90/deiteo
The text was updated successfully, but these errors were encountered: