-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to start Crackq with Ubuntu 20.04 and Nvidia Drivers #33
Comments
Moving past that above, Running into a new issue of jobs failing being submitted. The original error is the following:
Updated
Updating
|
Thanks for the update. Have a look in /utils there's a couple of scripts that will help you get more info on the error message for the speed_check queue as it's a hidden job queue ^ this will get you the list of jobs, copy the job id in question there
^ this will get a more detailed error message for that job You may need to modify the scripts as it looks like the name resolution has changed in docker networking recently:
|
Running
|
OK. If you tick disable brain does it run the job or give more detail in the error? |
Disabling brain runs the job from what I've noticed. Still dont have a decent pool of test hash files to use since it finished that |
Hello, I cannot get past the issue where Pypal failed to import. How did you solve it? Please describe the solution |
Try disabling the brain and it might show a more detailed error message. |
I'm getting the same
|
Regarding like @adnahmed , I had to get an older version of rq for the script to run.
The error was
Once I did get the script to run, I got the same load key error that I see in the UI.
|
Don't bother debugging this, I've got updates to push imminently with the docker container and all python libs updated. Should be available later today, I'm just cleaning it up. |
Check out the dev branch, this should all be fixed there now. |
This should be fixed in master, let me know if it's still not working. |
I am running from master branch with Python 3.8 on Ubuntu. Jobs will run fine for a while and then start failing. Message below. A docker compose down+up gets jobs running again.
|
Try adding this to the docker-compose file:
Note 'runtime: nvidia' is removed. I'll do some further testing when I get the chance, probably on the weekend. |
With this docker-compose:
I get this error starting:
|
is this with the nvidia devel image (nvidia/cuda:12.2.0-devel-ubuntu20.04)? |
This happened with both devel and runtime images |
It took over a week of running jobs through the system, but the |
Failed again today with the same
After restarting the containers the GPU is visible again.
Obviously this isn't a CrackQ issue, but is there any way in the API we can monitor for GPU "health"? That would be cleaner than waiting for jobs to fail. |
Regarding the loss of GPU visibility in the container, 'Failed to initialize NVML: Unknown Error`, I found these issues: NVIDIA/nvidia-docker#1730 I will stop posting in this issue as it seem totally unrelated to CrackQ. Apologies for the distraction. |
Check the v0.1.2 branch out, I believe this should be fixed now. I had no issues testing on some ec2 test boxes. I'll close this off when I merge into master if I don't hear anything, but feel free to reopen |
Closing as above |
Prerequisites
Enable debugging:
sudo docker exec -it crackq /bin/sed -i 's/INFO/DEBUG/g' /opt/crackq/build/crackq/log_config.ini
Unable to do as the crackq container fails to start.
Prior to reaching this point I also had to tweak the Nvidia + Ubuntu Dockerfile as Python3.7 and Python3.7-Dev are not available on Ubuntu 20.04 and would throw an error when running
./install.sh /docker/nvidia/ubuntu
. I changed these to Python3.8 and Python3.8-Dev which fixed the issue.I needed to uncomment
ENV DEBIAN_FRONTEND noninteractive
otherwiseinstall.sh
would hang on setting up a timezone for tzdata.I also needed to change
FROM nvidia/cuda:runtime-ubuntu20.04
to specify a version number. Using the most recent, I went withFROM nvidia/cuda:11.6.0-runtime-ubuntu20.04
.Describe the bug
Running the
sudo docker-compose -f docker-compose.nvidia.yml up --build
command, Crackq is unable to start. The error thrown is after a long traceback through python imports is the following:ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/usr/local/lib/python3.8/dist-packages/markupsage/__init__.py
See picture below for the entire traceback.
To Reproduce
Steps to reproduce the behavior:
1. Pull the Crackq Repo
2. Following the readme, install the latest Docker and Docker-compose
3. Download the latest Nvidia Server Drivers.
This may vary depending on GPU's. I'm using 7 ZOTAC 1080ti's and I installed the recommended drivers shown by running the following.
4. Install Nvidia Docker
This part took some trial and error as the Crackq readme says the following:
However per the Nvidia-Docker Docs, they recommend installing
nvidia-docker2
which resulted in some problems for me. Instead I followed the Crackq readme and did the following after installingnvidia-container-runtime
per Nvidia-Container-Runtime.5. Run Install.sh
6. Configuration
I ran through the configuration portion. Did everything documented in the Crackq Configuration
I skipped any type of authentication setup and modified the custom nginx config and added my own certs to the proper directory
Expected behavior
After all this I should be able to run the application with either:
sudo docker-compose -f docker-compose.nvidia.yml up --build
Or
sudo docker-compose -f docker-compose.nvidia.yml up -d
Debug output
This is the output I receive from the docker logs as the containers were starting:
Additional context
To add, I was able to temporarily workaround this by modifying the Nvidia/Ubuntu/Dockerfile again and including this command
RUN pip3 install markupsafe==2.0.1
per this issue forum.However this then led to an issue where Pypal also failed to import. Unfortunately I don't have the output from the docker logs, however the logs threw a similar traceback stack as the one above listing that it was unable to locate pypal.
If I missed anything please let me know.
The text was updated successfully, but these errors were encountered: