-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Cloud Deployment Fails when Installing the GRID Drivers #21
Comments
I dont think I can do a PR so I will post the solution: In the Isaac Sim docs the recommended way is to run the script found here: https://docs.omniverse.nvidia.com/isaacsim/latest/installation/install_advanced_cloud_setup_gcp.html So to fix the Ansible it is required just some minor changes: the config.py line 35 needs to include the following
At the nvidia-driver.gcp.yml from Line 35 below I commented the code that is no longer necessary, GRID drives are no longer necessary and dont work at all. So we should just run the install script as recommended in Isaac Sim docs and on the GCP docs.
Additionally I would like to comment that it is necessary to add some logic to double check and allow for multiple driver versions and the Terraform code might need improvements |
@renanmb Thank you so much for reporting and solving this! I will include it asap. |
With the update to IsaacSim 4.2 and new IsaacLab release I tried to use IsaacAutomator to deploy instances on GPC. My previous deployments with Isaac 4.1 have worked and my AWS deployment with Isaac 4.2 works, so I have reason to believe that Google changed something on their hypervisor because I have issues trying to install Nvidia Drivers, so this problem extends beyond IsaacAutomator. The Ansible roles must be reviewed for GCP as not only it no longer install the drivers reliably, there are some infinite loops not accounted for.
Here is the command I used for to deploy:
./deploy-gcp --ngc-api-key NGC-KEY --project PROJECT-ID --deployment-name test-gcp-issacsim --isaac --isaac-gpu-count 1 --isaac-instance-type g2-standard-32 --isaac-image nvcr.io/nvidia/isaac-sim:4.2.0 --vnc-password 123456 --zone us-west1-a --oige no --isaaclab v1.3.0
Error message:
This issue is mostly related to Ansible trying to install the Nvidia Drivers on the GCP in a way that is no longer supported:
Inside: src/ansible/roles/nvidia/tasks we find the file nvidia-driver.gcp.yml
The TASK: name: GCP / Install GRID driver
is running the following command
./nvidia_driver.run --x-module-path=/usr/lib/xorg/modules/drivers --run-nvidia-xconfig --disable-nouveau --no-questions --silent
When logging into the instance and running the command described above I obtain the following error:
When following the installation instructions on GCP docs to fix the Ansible role: https://cloud.google.com/compute/docs/gpus/install-grid-drivers#debianubuntu
It install the drivers but the It is still unable to complete the setup, other issues arise one of them being an infinite loop in the autorun.yml
xset is unable to open the display and it keeps waiting for it.
The text was updated successfully, but these errors were encountered: