-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't use AWS Instance GPU on GITLAB CI and CML-RUNNER #848
Comments
👋 Hello, @leoitcode! Can you please connect to the instance through SSH and retrieve some additional information? $ npm --version
$ cat /var/log/*cloud*
$ nvidia-smi |
Hello
I tried to follow the tutorial but I'm having this error:
I'm using this command: |
Just adding the job log on CI of the deploy_job step: and the train_job step: |
@leoitcode My guess is that it is having trouble parsing the output from 🤔 there might be a bug in handling that option, this is the first time I've seen it used. In the meantime, you can probably use the AWS web console to connect to the instance instead of trying to pass your private key. |
I tried access by AWS Console, but I got the following error: Some usernames that I tried: |
@dacbd I managed to make it work by adding EOF to my pem file:
To be honest I have no idea of how this works, I just imagined it could be that by looking at what you did here: Maybe there is a more elegant way of doing this 😆 |
Me and @leoitcode are working together at this. I connected to the deployed instance and managed to execute the
But no
I thought this might be a problem by looking at this line, but I'm not sure: Testing containerInside of the instance, if I do this: But if I do this: I couldn't find where something like this ( Maybe there is a config file for the gitlab-runner that should be changed for something like this: https://docs.gitlab.com/runner/configuration/gpus.html I also notice that at Github Actions there is this option, like in this snippet: run:
needs: deploy-runner
runs-on: [self-hosted,cml-runner]
container:
image: docker://iterativeai/cml:0-dvc2-base1-gpu
options: --gpus all
steps:
- uses: actions/checkout@v2
I'm attaching the result of Let us know if we can provide any other information. Thanks. |
Wish I could help but I think we need to wait for the world to rotate back to @0x2b3bfa0 side 🌍 I see some npm install errors, you could try and add a startup script with are your using gitlab-ci or github actions? |
Having seen that Line 204 in e338266
|
@DavidGOrtega this lines wouldn't make Lines 177 to 181 in e338266
I think this might be the problem. If not that what else could I be missing?
@dacbd I'm using gitlab-ci, with the yaml file that @leoitcode has posted in the first comment of the issue. |
If |
You're right, my bad. |
We still can't make this work, is there any other thing we can try? Or any other information, log etc that we can provide? |
I see |
O.o'' @dacbd I thank you so much.. I can't believe we couldn't see it.. |
no worries, its safe to say we all do it 🙈 |
OMG I hate typos! 😳 We were biased because in our first try the GPU didn't work, but now we know it was a configuration problem. Anyway, I'm so sorry for this and thank you all so much for the patience and attention. We will close this now 🙈🙈🙈 |
What about this one @dacbd ? Can I contribute somehow with this? Is this really a bug or is just a encoding problem of myself? |
@gitdoluquita if you are looking for a good thing to try and contribute, I think there is some value here. I would argue that how the param was used I think that this most likely has to do with yargs parsing and that would have to be an upstream change, however, I suspect that I would
Footnotes
|
@gitdoluquita if you want help getting started poke me on discord, happy to help get you going on any thing if you need it. I think our time zones overlap more 🌎 |
Opened #852 for the SSH key passing issue :) PRs welcome! |
It would be awesome @dacbd I'm planning to work on this soon. What is your nickname there? I'm in the DVC discord server, luccasqdrs there. |
@gitdoluquita dabarnes |
I have this gitlab-ci.yml:
But, the container can't recognize driver or GPU, on nvidia-smi command I had the following error:
/usr/bin/bash: line 133: nvdia-smi: command not found
I realized that iterativeai/cml:0-dvc2-base1-gpu can't use instance GPU. How could I install nvidia drivers and the nvidia-docker and activate
--gpus option on this docker?
Thank you
The text was updated successfully, but these errors were encountered: