-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runner Fix nvidia setup and restart #607
Conversation
As per golang/go#15113 (comment), the 2 second timeout we had in place should be enough, but it apparently isn't. 😅
See also golang/go#15113 → gravitational/teleport#1153 → gravitational/teleport#1152 for a possible solution. |
We already have a solution in place |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good to me, although it would be nice to find a better (?) workaround (e.g. gravitational/teleport#1152) for the SSH timeout issue instead of leaking goroutines. Not that it matters much for this use case either, but doesn't feel quite right.
@0x2b3bfa0 We are using gravitational/teleport#1152 workaround! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry! I missed that commit! 🙈 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@0x2b3bfa0 lets merge and release? |
This has been a tricky PR. After fixing the restart to be able to access the GPU due to kernel upgrade we were hitting a very funny error seen here and here hanging the ssh connection on DIAL and never escaping back, hence the resource timing out.
To solve it
I had to put the logs function within a timed-out function.we added Teleport's fix mentioned on #607 (comment)instance_gpu
to setup NVIDIA. Uses ubuntu-drivers for GPU auto detectionRelated: #606