rPackages.torch: use binary package for installation#328980
rPackages.torch: use binary package for installation#328980b-rodrigues wants to merge 3 commits intoNixOS:masterfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
When you say "building the package from source", are you referring to the R package Please compare the code between this PR and #273536. Also have a look at my other attempts to make later versions of I could not successfully build the GPU-capable C binaries from source, but I eventually managed to get the GPU-capable pre-compiled binaries to work, following the example of libtorch-bin. For non-NixOS systems, you also need https://github.com/numtide/nix-gl-host. I felt the code I have written in #273536 is very much a hack, and was not ready to push the code to nixpkgs without a few more people looking over it to make improvements. No-one commented on the draft PR until now. You shouldn't need a separate Apologies if something is unclear, I don't have a lot of time at the moment to proofread, but happy to answer questions when I am able to get around to it. |
This comment was marked as outdated.
This comment was marked as outdated.
|
If you can get the GPU version to run without all the messy code I added, I will be happy to close my PR and support this one. |
|
This does seem cleaner if GPU can be made to run. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Great! 6 lines of code changed is much nicer than 70, and much more maintainable. I suggest making a wiki entry or very visible doc strings as part of this PR if possible, as using nixglhost is not obvious or necessary in all cases, and I don't feel comfortable wrapping every call to R with nixglhost silently. The user might also want to know how to download the CPU version, if hard drive space is a concern. I can't test on NixOS right now as my dev machine is in for service. @jbedo, are you able to test on NixOS? There might be some interest in compiling |
6790962 to
e69d654
Compare
|
thanks for your help @PhDyellow and @jbedo this PR is ready to be reviewed! |
This comment was marked as outdated.
This comment was marked as outdated.
| The `torch` package included in `nixpkgs` is based on the binary packages by the | ||
| mlverse team. By default, the CPU version of the package is installed. To enable | ||
| GPU acceleration, set `config.cudaSupport` to true when importing `nixpkgs` in a | ||
| `shell.nix`. You will also need to use `nixglhost`. Here is an example of such a |
There was a problem hiding this comment.
My understanding is that you only need nixglhost if you are not on NixOS.
There was a problem hiding this comment.
This should be the case, but I don't see any patching happening in this PR so likely it won't work on NixOS either. Should be possible to patch the binaries to operate without tricks on NixOs.
There was a problem hiding this comment.
Yes, it is possible to patch the liblantern and libtorch binaries, I did it successfully in #273536. After patching, the binaries ran on both NixOS and nix in CentOS. I didn't try patching the full bundle being installed in this PR though.
Whether nixglhost works on NixOS is another thing to test.
There was a problem hiding this comment.
tested on NixOS and confirmed he needed nixglhost as well.
This is a bug then, the appropriate patchelfing is necessary. Nixglhost should not be required on NixOS
You can use LD_DEBUG=libs R to find out which binaries (are loading cuda and) need the patching
There was a problem hiding this comment.
I have amended the docs to explain this is only for non-NixOS linux and that we are looking for people to fix torch for NixOS and darwin.
PhDyellow
left a comment
There was a problem hiding this comment.
Couldn't test functionality right now, but apart from comments I have left, everything looks like it is in order.
Documentation looks good too. It made sense to me, but I am familiar with running torch in R on a GPU, so I can't say whether it is clear to someone who has never tried before.
Maybe we leave a note in the language docs for someone to open a PR or issue if they have the hardware and can test it? I don't think it would be hard to add more logic to the |
There was a problem hiding this comment.
Is there much value to keeping a binary-only package in Nixpkgs rather than out-of-tree, is it a common dependency?
it wants to download and install the libtorch and liblantern libraries which doesn't work
Our python3Packages.torch can be/is consumed in C++/cmake projects, even though we haven't split libtorch out yet. Is there a way to undenvor libtorch in rPackages.torch? I'm not familiar with liblantern though
There was a problem hiding this comment.
I'll have to study python3Packages.torch but I'm quite a beginner with Nix and doubt I'll be able to do better than this PR 😅
There was a problem hiding this comment.
liblantern is a intermediate API between libtorch and R, maintained by the developers of R.torch. I think it was because calling libtorch from R was too difficult, and calling libtorch from R via python was too slow.
It would be good to get everything to compile from source. I would support that effort, but I would need a Nix CUDA expert to help because I kept getting issues with CUDA version compatibility. I was not able to successfully compile liblantern and libtorch despite spending days on the problem.
This PR bypasses all the complexity of trying to compile liblantern and libtorch from source, as well as the complexity of ensuring the correct CUDA libraries are present on the system. Consuming libtorch from python3Packages.torch may be possible, but compiling liblantern against libtorch didn't work easily for @jbedo over at #273536 (comment).
I feel the value of having this binary-only package in nixpkgs is for R users who are not nix users first. torch is in CRAN, and all CRAN packages are automatically included in pkgs.rPackages, so by the principle of least surprise, it should just work. The alternative is to cause it to fail in a way that directs the user to the Nix R wiki or something similar with detailed instructions for setting up their first overlay.
There was a problem hiding this comment.
Compiling liblantern dosen't seem that difficult in itself, the main issue is it needs and old version of torch and doesn't work with python3Packages.torch. Having it work out of the box, even if we need a binary, is a goal since we want to maximise CRAN/BioC functionality. We do need to patch these binaries though to at least work with NixOS out of the box without tricks.
There was a problem hiding this comment.
We do need to patch these binaries though to at least work with NixOS out of the box without tricks.
Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?
There was a problem hiding this comment.
Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?
If wrapping liblantern and libtorch worked, that sounds fine to me, but I am not sure we should wrap R in nixglhost by default on NixOS. I don't know whether it would cause any problems, it just seems excessive.
You could have a look at how I patched the binaries over at #273536. The important parts are the various patchelf commands and pkgs.cudaPackages.autoAddOpenGLRunpathHook.
There was a problem hiding this comment.
It would be good to get everything to compile from source. I would support that effort, but I would need a Nix CUDA expert to help because I kept getting issues with CUDA version compatibility
We can try working together
Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?
There shouldn't be any need, you just autoPatchelf and/or addDriverRunpath to link the relevant libraries directly
If wrapping liblantern and libtorch worked, that sounds fine to me, but I am not sure we should wrap R in nixglhost by default on NixOS
Correct, we shouldn't
pkgs.cudaPackages.autoAddOpenGLRunpathHook.
Note it's pkgs.autoAddDriverRunpath now
There was a problem hiding this comment.
I have amended the docs to explain this is only for non-NixOS linux and that we are looking for people to fix torch for NixOS and darwin.
For fixing this for NixOS I would need help, as I'm not running NixOS and don't have experience with patching binaries. Should we merge this one and then continue with another PR, or try to fix it as part of this PR?
ced2478 to
7c2dcd4
Compare
There was a problem hiding this comment.
Shouldn't this be an rWrapper.override instead?
There was a problem hiding this comment.
If we drop in a shell this is not strictly required, but would be needed if calling R non-interactively afaik
There was a problem hiding this comment.
I see. This confuses people, let's not advertise this and use rWrapper instead: https://mastodon.acm.org/@zwol@hackers.town/112745541961800402
There was a problem hiding this comment.
thanks, done with latest push!
7c2dcd4 to
f990eda
Compare
f990eda to
6b4943e
Compare
| torch = old.torch.overrideAttrs (attrs: { | ||
| preConfigure = '' | ||
| patchShebangs configure | ||
| ''; |
There was a problem hiding this comment.
Am I to understand that the original package was source-built? If so, I think we should keep a source-built torch and add a separate torch-bin attribute for the binary package?
There was a problem hiding this comment.
The original R torch package was indeed source built but didn't provide the torch library, it suggested users to download and install it on first run
There was a problem hiding this comment.
was indeed source built but didn't provide the torch library, it suggested users to download and install it on first run
oh
| nativeBuildInputs = attrs.nativeBuildInputs or [ ] ++ [ | ||
| pkgs.autoPatchelfHook | ||
| pkgs.autoAddDriverRunpath | ||
| ]; | ||
| buildInputs = attrs.buildInputs or [ ] ++ [ | ||
| "${lib.getLib R}/lib/R" # libR | ||
| pkgs.zlib | ||
| ] ++ lib.optionals pkgs.config.cudaSupport [ | ||
| pkgs.cudaPackages.cuda_cudart | ||
| ]; |
There was a problem hiding this comment.
These changes are sufficient to do something like
> library(torch)
> torch_sum(torch_randn(c(10, 10), device="cuda"))
However, normally libtorch_cuda links e.g. libcublas, etc, which I didn't have to explicitly mention. I suppose that means they're not listed in DT_NEEDED and are loaded by dlopen instead. Either way I expect all related functionality to be broken and I can't allocate the time to do more tests unfortunately
I also see there's a lot of garbage in the outputs, e.g. a vendored copy of libcudart.so. You could link those by telling autoPatchelf to add $ORIGIN, but I'd advise against it and recommend that you delete the vendored libraries. Cuda libraries tend to need ad hoc patchelfing as well, we've had issues with that before. Cuda libraries that originate from PyPi are usually broken, which is why we link the ones from cudaPackages instead
| nixglhost-sources = pkgs.fetchFromGitHub { | ||
| owner = "numtide"; | ||
| repo = "nix-gl-host"; | ||
| rev = "main"; | ||
| # Replace this with the hash Nix will complain about, TOFU style. | ||
| hash = ""; | ||
| }; |
There was a problem hiding this comment.
I think this should be documented in its own chapter rather than in the R section: #264433 (has been stale for a while, didn't get the chance to follow up) but maybe it's ok to temporarily merge this section and then replace it with a short reference when the chapter is added
| rpkgs = builtins.attrValues { | ||
| inherit (pkgs.rPackages) | ||
| torch; | ||
| }; | ||
|
|
||
| wrapped_pkgs = pkgs.rWrapper.override { | ||
| packages = [ rpkgs ]; | ||
| }; |
There was a problem hiding this comment.
Why not just packages = with rPackages; ...?
| torch; | ||
| }; | ||
|
|
||
| wrapped_pkgs = pkgs.rWrapper.override { |
There was a problem hiding this comment.
h'm, personally, I read rWrapper.override as a "wrapped interpreter" (wrapped so it knows where to find the packages), not as "wrapped packages"; e.g. I'd write python = python3.withPackages ...; I also saw people write env = python3.withPackages ...
pkgs makes me think of a package set, which this value is not
Description of changes
With this PR, I suggest that we use the prebuild binary supplied by the mlverse team. Currently, the {torch} package installs, but at first run, it wants to download and install the libtorch and liblantern libraries which doesn't work. Instead, we could use the prebuilt binaries of the
{torch}package which simplifies the whole process, as described in their documentation: https://torch.mlverse.org/docs/articles/installation#pre-builtThis PR handles both the CPU and GPU versions for Linux. I have also updated the documentation to explain how to use it in a second commit.
Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.