-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shim package to interact with an **existing** NVCC install #8229
Conversation
This adds a shim package to help conda-build interact with an **existing** nvcc install. Through this shim compiler package, users are able to ensure an appropriate cudatoolkit version is attached to the package and thus make sure that requirement is met at install time.
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
recipes/nvcc/build.sh
Outdated
CUDA_HOME="$(dirname $(dirname $(which nvcc)))" | ||
fi | ||
|
||
# Set `CUDA_HOME` in an activation script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CUDA_HOME
is already set should we store the value into BACKUP_CUDA_HOME
or similar and restore it in the deactivate script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that. Though this will probably be a little gross to write/read (we need to escape some stuff). Also it probably won't matter that much if users are already using this with our Docker image (the intent here).
recipes/nvcc/build.sh
Outdated
mkdir -p "${PREFIX}/etc/conda/activate.d" | ||
cat > "${PREFIX}/etc/conda/activate.d/${PKG_NAME}_activate.sh" <<EOF | ||
#!/bin/bash | ||
export CUDA_HOME="${CUDA_HOME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hardcodes the cuda install location right? How about making the user supply this? (This is what we do with MacOS SDK and CONDA_BUILD_SYSROOT
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. Though the intent is for this to come from the docker image where this location is fixed. So it probably doesn't matter much that it is hardcoded.
I suppose we could do that. Though I'm not clear on what the intent would be. Do we want users to use this package outside of the Docker image? It's not clear to me how we can do that reasonably yet. Would prefer getting this out in a limited form and seeing what we can accomplish with it already before thinking about other ways to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want users to use this package outside of the Docker image?
Yes. CONDA_BUILD_SYSROOT
is a way for users to use the macos compilers outside of conda-build.
I think we want to get this correct the first time to ensure that we don't need to keep on supporting older conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users are still able to build packages with this convention outside of conda-forge (have been doing that for a bit in fact). This works the same as it does with other Linux builds. IOW users run a build script that starts one of our Docker images and does the build within it.
No objections to getting things right. Am just not seeing how this is causing issues. Could you please clarify on the problems you are seeing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to users who don't have access to docker?
For ex: I was building qt
on a machine without access to docker, but it works, because you really don't need the docker image. With this package, you are tying the building of conda packages to the docker image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This opens us up to figuring out how people have installed the NVIDIA toolchain, whether it was done correctly, and trying to debug the various issues they may encounter by doing this incorrectly. I'd prefer not to have that exposure by limiting our supported case to one that we know behaves well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, then please add some code to the activate script to error out if ${CUDA_HOME}
and ${CUDA_HOME}/lib64/stubs/libcuda.so
is not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. Have pushed such a test. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also check the nvcc version matches the package version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
As there were a few questions about the |
looks reasonable to me, thanks for the draft PR! |
Namely we need to find the compiler's sysroot, which it knows of, to symlink `libcuda` there so that it will find it during a build.
We need a way of forwarding the compiler preference to `nvcc`. Namely we want to ensure that the Anaconda C++ compiler is used. This script accomplishes that by adding a flag specifying the right compiler to use.
Provides a simple test for compiling some CUDA code.
After playing with this a bit more, have found it needed a few more changes. In particular this now contains a small shell script, which calls In addition, have added a small compilation test to build some trivial CUDA code. |
Ensure that CUDA includes are added to the C and C++ compiler flags so they can be found when building.
As the Docker image will know what version it is, just copy that version into the `nvcc` package when creating it. After all the point of this package is to act as a compiler shim for `nvcc` in a Docker container with that version.
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipes/nvcc:
|
This should ensure re-rendering goes smoothly as it mocks the architecture anyways, which could have ill-effects when creating the feedstock.
Should make the linter happier.
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipes/nvcc:
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
@conda-forge/core, this is ready for a final review. 😄 Builds will fail here as we need to use the CUDA Docker images. However we can't easily use them without this package in place first. So I propose we get this through (as long as we are happy with it). Then we can deal with the rest of the configuration issues on the feedstock. |
The build script no longer makes sense here as that happens before the split package is handled.
37b7dcd
to
a4b5c68
Compare
@@ -0,0 +1,64 @@ | |||
{% set name = "nvcc" %} | |||
{% set version = environ.get("CUDA_VER", "9.2") %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a recipe/conda_build_config.yaml
with,
CUDA_VER:
- 9.2
- 10.0
and use the value like set version = CUDA_VER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 9.2
is really only here to make the linter happy. Personally I'd rather not have it here at all if it could be avoided.
Basically what this is doing is picking up CUDA_VER
from the docker image.
Co-Authored-By: Isuru Fernando <[email protected]>
Any other thoughts @conda-forge/core? If not, am planning on merging soon 🙂 |
- libgcc-ng | ||
run_exports: | ||
strong: | ||
- cudatoolkit {{ version }}|{{ version }}.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the run_exports
of the C++ compiler runtime here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? If the user is using the C++ compiler, that will be added by this anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the runtimes like libgcc-ng
and libstdcxx-ng
will not be added. run_exports
is not transitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think users should be relying on nvcc
to add these things. If they are using a C or C++ compiler, they should list those requirements explicitly in their recipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the C and C++ compiler should not be in run requirements. The way the compiler package is set up is that if the user tries to use a C++ compiler in the docker image, it will not be found if it is not an explicit requirement. Having the explicit requirement means that the run_exports
for libgcc-ng
is added.
But now, if the user adds nvcc_linux-64
, they get the C++ compiler, but no run_exports
for libgcc-ng
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Have dropped the C and C++ compiler from run
. They won't be installed now. The test has been updated to list C and C++ as requirements of the test.
recipes/nvcc/meta.yaml
Outdated
- test -f "${PREFIX}/etc/conda/{{ state }}.d/{{ PKG_NAME }}_{{ state }}.sh" | ||
{% endfor %} | ||
# Try using the activation scripts. | ||
- source ${PREFIX}/etc/conda/activate.d/{{ PKG_NAME }}_activate.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this line? It should have been run already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but this makes it clearer to the reader what we are testing. Explicit is better than implicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then, we are not testing that CUDA_HOME
is automatically set. We are checking that CUDA_HOME
is set when that script is run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. That said, do these test modifications need to be handled in staged-recipes
or can they be handled in the feedstock? Note that more work will need to be done in the feedstock anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. If you fix the other issue, then I'll merge. Also, please wait for my review on a PR in the feedstock before this package is released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a higher level comment, making these suggestions is good. Just want to make sure we are progressing on the larger objectives they sit within as well.
I switched things around so now we test activation works without running the activation script first. Also added an explicit test of the activation script after running/testing the deactivation script. This also allowed a little bit of cleanup around the build step. So hopefully a bit nicer. Though we can discuss more in the feedstock.
Do not explicitly include the C and C++ compilers as users should explicitly add these as requirements of their package.
This shows users how they should be using `nvcc` with other compilers and what this looks like. Also tests the expected use case to make sure it works correctly.
|
||
extra: | ||
recipe-maintainers: | ||
- jakirkham |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add me as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure would be happy to have your help here. 🙂 Added you in the last commit.
Make sure that activation works without running the activation script. Also test the activation script works as expected when run explicitly.
These should no longer be needed as the compiler's environment is properly activated now after running the activation tests.
I think I've addressed all of your concerns, @isuruf. Could you please take another look? 🙂 |
Thanks @isuruf! 🎉 |
Have been using this to build some GPU packages for work. Am placing it up here as a draft PR to solicit feedback and start the discussion towards building a community standard. Please share your thoughts.
Summary: This create a small shim package designed to present an existing NVCC compiler installed by the user to conda-build. Additionally it attaches the
cudatoolkit
package fromdefaults
as a runtime requirement to anything that uses this compiler with an appropriate version constraint. The goal is to make it easy to build Conda packages that use GPUs. It works well in conjunction with the Docker image proposed in PR ( conda-forge/docker-images#93 ).cc @kkraus14 @mike-wendt @mrocklin @datametrician @msarahan @seibert @jjhelmus @soumith (feel free to include others I may have missed)
Checklist