-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart a cdn instance cause image pull takes long time. #949
Comments
could someone look into this issue? |
I tried the v2.0.2-alpha.6, the same result, occationally takes a long time to pull an image. |
Also tried helm chart version 0.5.16, the same problem, I doubt it can be deployed to a production envrionment. |
Anyone can help? I deployed dragonfly v2.0.2-alpha.6 with the helm chart 0.5.26, default settings, the same results. I restarted 2 cdn instances, then it took a long time to pull a image. Four pods ran on four nodes, the time to pull a 143MB image is: Restarted all schedulers seems solved the problem. |
@jim3ma Could you pls take a look at this issue? I plan to deploy to production environment recently. |
Can you upload the log of cdn and dfdaemon? |
Here is the logs: core-dfdaemon-dvqgp.log |
Bug confirmed: scheduler did not refresh cdn status when cdn ip changed, and returned the old ip to peers. |
@likunbyl The latest code change fixed this issue, you can have a try. |
Thank you, I need the docker image to try out, could it be provide? |
By the way, I have another issue about preheat enhencement, do you have a plan ? |
Just run |
I have tried the newest code, when I restarted all three cdn pods, again it took more than 1m to pull an image. Here is the logs: |
I will investigate it. Please wait some time. |
done, thanks. |
Bug report:
I deployed drygonfly v2.0.2-alpha.2 with Helm chart 0.5.26. The only change I made is that dfdaemon not to discover the schedulers automaticly through manager, instead I specified the domainname of the schedulers.
Then I did some testing. In a test, I restarted a cdn pod, then sometimes it took 1m to pull a image, normally it took only 7s.
From the logs, I noticed that dfdaemon tried to access the previous ip of the restarted cdn pod, not the new one:
In this log, 10.218.44.208 is the ip of cdn pod before it's restart, while the ip 10.218.44.246, which comes from the peeid, is the new ip. At the end, it timeout, and pulled the image from the backsource eventually:
This happened after the cdn pod get restarted a few minutes, the new ip has been inserted into the database table for a while:
If detailed log needed, I will provide.
Expected behavior:
A cdn pod restart can't effect the pull speed, dfdaemon can get the right cdn ip
How to reproduce it:
3, restart a cdn pod.
Environment:
uname -a
): 3.10.0-1160.31.1.el7.x86_64The text was updated successfully, but these errors were encountered: