-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TPU Trillium Base Docker Image cannot initialize #8371
Comments
@tengyifei do you know if Trillium requires a special libtpu? AFAIK new releases docker should work on a Trillium as long as base image is right. |
The appropriate libtpu should be baked into Is there any error logs in particular? What if you add some logging env vars according to https://github.com/pytorch/xla/blob/master/docs/source/learn/troubleshoot.md#environment-variables |
Thank you for your quick response. I am on TPU v6e VM with 4 core. When I tried to start docker I get error in tpu initialization phase. I am building and running my docker inside the my tpu vm. In the tpu-recipes benchmark code it is running outside vm and ssh into vm with --worker=all and gives command. With tpu-recipes version it worked but why my docker run version is not working?
However when I start without docker my code and this example works well.
|
The
|
@tengyifei @zpcore seems like we can improve our workflow to help our users avoid this issue. |
TPU initialization is failed
When I started tpu v6e-4 TPU Vm with v2-alpha-tpuv6e base image, with pip enviroment and xla updates I can clearly initialized tpus. However when I start to dockerize my pipelie, it fails to initialize TPUs. I tried so much tpu xla base images but I could not achieve to initialize. This happens everytime get device from torch_xla.core.xla_model.xla_device().
I have checked this base images. I guess v2-alpha-tpuv6e configuration is crucial, is there any related base docker image?
To Reproduce
DevDockerfile
#app.py
Both file are in same directory. Generate docker with
docker build -f DevDockerfile -t tpu .
Then run with privileged.
docker run -ti --rm -p 5000:5000 --privileged tpu
Expected behavior
Environment
The text was updated successfully, but these errors were encountered: