Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shim package to interact with an **existing** NVCC install #8229

Merged
merged 20 commits into from
Aug 16, 2019

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Apr 29, 2019

Have been using this to build some GPU packages for work. Am placing it up here as a draft PR to solicit feedback and start the discussion towards building a community standard. Please share your thoughts.

Summary: This create a small shim package designed to present an existing NVCC compiler installed by the user to conda-build. Additionally it attaches the cudatoolkit package from defaults as a runtime requirement to anything that uses this compiler with an appropriate version constraint. The goal is to make it easy to build Conda packages that use GPUs. It works well in conjunction with the Docker image proposed in PR ( conda-forge/docker-images#93 ).

cc @kkraus14 @mike-wendt @mrocklin @datametrician @msarahan @seibert @jjhelmus @soumith (feel free to include others I may have missed)

Checklist

  • License file is packaged (see here for an example)
  • Source is from official source
  • Package does not vendor other packages
  • Build number is 0
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there

This adds a shim package to help conda-build interact with an
**existing** nvcc install. Through this shim compiler package,
users are able to ensure an appropriate cudatoolkit version is
attached to the package and thus make sure that requirement is
met at install time.
@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/nvcc) and found it was in an excellent condition.

CUDA_HOME="$(dirname $(dirname $(which nvcc)))"
fi

# Set `CUDA_HOME` in an activation script.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CUDA_HOME is already set should we store the value into BACKUP_CUDA_HOME or similar and restore it in the deactivate script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that. Though this will probably be a little gross to write/read (we need to escape some stuff). Also it probably won't matter that much if users are already using this with our Docker image (the intent here).

mkdir -p "${PREFIX}/etc/conda/activate.d"
cat > "${PREFIX}/etc/conda/activate.d/${PKG_NAME}_activate.sh" <<EOF
#!/bin/bash
export CUDA_HOME="${CUDA_HOME}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcodes the cuda install location right? How about making the user supply this? (This is what we do with MacOS SDK and CONDA_BUILD_SYSROOT)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Though the intent is for this to come from the docker image where this location is fixed. So it probably doesn't matter much that it is hardcoded.

I suppose we could do that. Though I'm not clear on what the intent would be. Do we want users to use this package outside of the Docker image? It's not clear to me how we can do that reasonably yet. Would prefer getting this out in a limited form and seeing what we can accomplish with it already before thinking about other ways to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want users to use this package outside of the Docker image?

Yes. CONDA_BUILD_SYSROOT is a way for users to use the macos compilers outside of conda-build.

I think we want to get this correct the first time to ensure that we don't need to keep on supporting older conventions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users are still able to build packages with this convention outside of conda-forge (have been doing that for a bit in fact). This works the same as it does with other Linux builds. IOW users run a build script that starts one of our Docker images and does the build within it.

No objections to getting things right. Am just not seeing how this is causing issues. Could you please clarify on the problems you are seeing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to users who don't have access to docker?

For ex: I was building qt on a machine without access to docker, but it works, because you really don't need the docker image. With this package, you are tying the building of conda packages to the docker image.

Copy link
Member Author

@jakirkham jakirkham Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This opens us up to figuring out how people have installed the NVIDIA toolchain, whether it was done correctly, and trying to debug the various issues they may encounter by doing this incorrectly. I'd prefer not to have that exposure by limiting our supported case to one that we know behaves well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then please add some code to the activate script to error out if ${CUDA_HOME} and ${CUDA_HOME}/lib64/stubs/libcuda.so is not found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. Have pushed such a test. Please take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also check the nvcc version matches the package version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jakirkham
Copy link
Member Author

As there were a few questions about the CUDA_HOME environment variable, I think we can probably just drop that. It's mostly a holdover from an earlier iteration of this recipe. Hopefully that addresses a few concerns. 🙂

@soumith
Copy link
Contributor

soumith commented May 28, 2019

looks reasonable to me, thanks for the draft PR!

Namely we need to find the compiler's sysroot, which it knows of, to
symlink `libcuda` there so that it will find it during a build.
We need a way of forwarding the compiler preference to `nvcc`. Namely we
want to ensure that the Anaconda C++ compiler is used. This script
accomplishes that by adding a flag specifying the right compiler to use.
Provides a simple test for compiling some CUDA code.
@jakirkham
Copy link
Member Author

After playing with this a bit more, have found it needed a few more changes. In particular this now contains a small shell script, which calls nvcc as opposed to the symlink it had previously. The shell script is there merely to pass -ccbin with the Anaconda C++ compiler specified (and any other user provided arguments). Thus making sure the Anaconda compiler is used. This seems to handle a few more use cases.

In addition, have added a small compilation test to build some trivial CUDA code.

Ensure that CUDA includes are added to the C and C++ compiler flags so
they can be found when building.
@djsutherland djsutherland mentioned this pull request Jul 31, 2019
6 tasks
As the Docker image will know what version it is, just copy that version
into the `nvcc` package when creating it. After all the point of this
package is to act as a compiler shim for `nvcc` in a Docker container
with that version.
@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/nvcc) and found some lint.

Here's what I've got...

For recipes/nvcc:

  • Package version doesn't match conda spec

This should ensure re-rendering goes smoothly as it mocks the
architecture anyways, which could have ill-effects when creating the
feedstock.
Should make the linter happier.
@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/nvcc) and found some lint.

Here's what I've got...

For recipes/nvcc:

  • The home item is expected in the about section.
  • The license item is expected in the about section.
  • The summary item is expected in the about section.
  • The recipe must have a build/number section.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/nvcc) and found it was in an excellent condition.

@jakirkham jakirkham changed the title WIP: Shim package to interact with an **existing** NVCC install Shim package to interact with an **existing** NVCC install Aug 15, 2019
@jakirkham jakirkham marked this pull request as ready for review August 15, 2019 17:32
@jakirkham
Copy link
Member Author

@conda-forge/core, this is ready for a final review. 😄

Builds will fail here as we need to use the CUDA Docker images. However we can't easily use them without this package in place first. So I propose we get this through (as long as we are happy with it). Then we can deal with the rest of the configuration issues on the feedstock.

@@ -0,0 +1,64 @@
{% set name = "nvcc" %}
{% set version = environ.get("CUDA_VER", "9.2") %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a recipe/conda_build_config.yaml with,

CUDA_VER:
  - 9.2
  - 10.0

and use the value like set version = CUDA_VER

Copy link
Member Author

@jakirkham jakirkham Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 9.2 is really only here to make the linter happy. Personally I'd rather not have it here at all if it could be avoided.

Basically what this is doing is picking up CUDA_VER from the docker image.

@jakirkham
Copy link
Member Author

Any other thoughts @conda-forge/core? If not, am planning on merging soon 🙂

- libgcc-ng
run_exports:
strong:
- cudatoolkit {{ version }}|{{ version }}.*
Copy link
Member

@isuruf isuruf Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the run_exports of the C++ compiler runtime here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? If the user is using the C++ compiler, that will be added by this anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the runtimes like libgcc-ng and libstdcxx-ng will not be added. run_exports is not transitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think users should be relying on nvcc to add these things. If they are using a C or C++ compiler, they should list those requirements explicitly in their recipe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the C and C++ compiler should not be in run requirements. The way the compiler package is set up is that if the user tries to use a C++ compiler in the docker image, it will not be found if it is not an explicit requirement. Having the explicit requirement means that the run_exports for libgcc-ng is added.
But now, if the user adds nvcc_linux-64, they get the C++ compiler, but no run_exports for libgcc-ng.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Have dropped the C and C++ compiler from run. They won't be installed now. The test has been updated to list C and C++ as requirements of the test.

- test -f "${PREFIX}/etc/conda/{{ state }}.d/{{ PKG_NAME }}_{{ state }}.sh"
{% endfor %}
# Try using the activation scripts.
- source ${PREFIX}/etc/conda/activate.d/{{ PKG_NAME }}_activate.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this line? It should have been run already

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but this makes it clearer to the reader what we are testing. Explicit is better than implicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then, we are not testing that CUDA_HOME is automatically set. We are checking that CUDA_HOME is set when that script is run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. That said, do these test modifications need to be handled in staged-recipes or can they be handled in the feedstock? Note that more work will need to be done in the feedstock anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. If you fix the other issue, then I'll merge. Also, please wait for my review on a PR in the feedstock before this package is released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a higher level comment, making these suggestions is good. Just want to make sure we are progressing on the larger objectives they sit within as well.

I switched things around so now we test activation works without running the activation script first. Also added an explicit test of the activation script after running/testing the deactivation script. This also allowed a little bit of cleanup around the build step. So hopefully a bit nicer. Though we can discuss more in the feedstock.

Do not explicitly include the C and C++ compilers as users should
explicitly add these as requirements of their package.
This shows users how they should be using `nvcc` with other compilers
and what this looks like. Also tests the expected use case to make sure
it works correctly.

extra:
recipe-maintainers:
- jakirkham
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add me as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure would be happy to have your help here. 🙂 Added you in the last commit.

Make sure that activation works without running the activation script.
Also test the activation script works as expected when run explicitly.
These should no longer be needed as the compiler's environment is
properly activated now after running the activation tests.
@jakirkham
Copy link
Member Author

I think I've addressed all of your concerns, @isuruf. Could you please take another look? 🙂

@isuruf isuruf merged commit acf336b into conda-forge:master Aug 16, 2019
@jakirkham jakirkham deleted the add_nvcc_wrapper branch August 16, 2019 19:17
@jakirkham
Copy link
Member Author

Thanks @isuruf! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants