-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempts at fixing CI #144
Conversation
The Linux build with CUDA 11.8 is failing with the error
Are we doing something to cause it to use an unreasonable amount of disk space? I don't see anything obvious. |
See here, I got the CUDA 11.8 one working: For CUDA 12 I am not able to convince pytorch to find CUDA sources. |
Do we really need to use that action? Since all we're doing is building CUDA code, not running it, can we get by with just the CUDA conda packages? |
For some reason the Python 3.12 build keeps installing Python 3.10 instead. I can't figure out where that's coming from. |
Your approach worked for CUDA 11.8. But for 12 it runs out of disk space before there's a chance to clean up after it.
|
The action is always giving us headaches, but before CUDA 12 it was the only sane way to get nvcc. |
For the OpenMM repo we instead use this script to install CUDA. I'll try using it instead. |
I'm running out of patience with this. I suggest we just do the tests we can do and not worry about the rest. That means,
|
Ok! I finally have all tests passing. That required cutting back on what tests we run, but I think that's the best we can do for the moment. Without having actual GPUs to test on, the CI won't provide really satisfactory testing no matter what we do. It will catch what it can catch, and we'll have to do additional testing by hand. This is ready for a first review. |
os: ubuntu-22.04 | ||
cuda-version: "11.8.0" | ||
gcc-version: "10.3.*" | ||
nvcc-version: "11.8" | ||
python-version: "3.10" | ||
pytorch-version: "2.0.*" | ||
pytorch-version: "2.1.*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is pytorch 2.3 in conda-forge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried 2.3, but it fails to install.
Could not solve for environment specs
The following packages are incompatible
├─ __cuda is requested and can be installed;
├─ python 3.10** is installable with the potential options
│ ├─ python [3.10.0|3.10.10|...|3.10.9], which can be installed;
│ └─ python [3.10.0|3.10.1|...|3.10.9] would require
│ └─ python_abi 3.10.* *_cp310, which can be installed;
└─ pytorch-gpu 2.3** is not installable because there are no viable options
├─ pytorch-gpu 2.3.0 would require
│ └─ pytorch 2.3.0 cuda118_py39hd44be3b_300, which requires
│ ├─ python >=3.9,<3.10.0a0 , which conflicts with any installable versions previously reported;
│ └─ python_abi 3.9.* *_cp39, which conflicts with any installable versions previously reported;
└─ pytorch-gpu 2.3.0 would require
└─ pytorch 2.3.0 cuda120_py38heb61fd4_300, which requires
└─ cuda-version >=12.0,<13 , which requires
└─ __cuda >=12 , which conflicts with any installable versions previously reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried 2.2, but it reports that version isn't available at all. I switched back to 2.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pytorch 2.3 requires CUDA >=12 it seems, but only for python>3.9?. tbh I am not sure, I can never fully grasp these conda errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going forward, have proposed some suggestions that may help alleviate these issues ( #146 ). Also this may help with moving to CUDA 12 when that happens
We're getting a variety of errors on CI. I'm going to see if I can fix them.
The Linux builds fail with the compilation error
This is apparently due to a recent change to conda-forge. Hopefully it can be fixed by installing an extra package.
The Mac builds report two errors. The OpenCL tests fail with
Apparently you're supposed to specify
-lSystem
when runningld
. But in this case it isn't anything we have control over. It's being called internally by PoCL when it tries to compile kernels. We don't really want it using PoCL anyway, so I'll see if I can disable it.There's also a test failure
That's probably just a path issue.
Closes #126