python3Packages.torch-bin: 1.13.1 -> 2.0.0#221652
python3Packages.torch-bin: 1.13.1 -> 2.0.0#221652junjihashimoto wants to merge 5 commits intoNixOS:masterfrom
Conversation
|
Out of curiosity, I would like to consult a few questions. It seems that
Thanks a lot! |
|
@breakds IIRC PyTorch distributes their own builds of Triton from the branch OpenAI maintains for (with?) them: https://github.com/openai/triton/tree/torch-inductor-stable. I'll try it out later to see if it works. To be clear, I personally don't think a broken |
Thanks a lot for the explanation, especially the information about the Agree that a broken |
|
@junjihashimoto I was unable to build the derivation because $ nix build --impure -L github:NixOS/nixpkgs/refs/pull/221652/head#python3Packages.torch-bin
python3.10-torch> Sourcing python-remove-tests-dir-hook
python3.10-torch> Sourcing python-catch-conflicts-hook.sh
python3.10-torch> Sourcing python-remove-bin-bytecode-hook.sh
python3.10-torch> Sourcing wheel setup hook
python3.10-torch> Using wheelUnpackPhase
python3.10-torch> Sourcing pip-install-hook
python3.10-torch> Using pipInstallPhase
python3.10-torch> Sourcing python-imports-check-hook.sh
python3.10-torch> Using pythonImportsCheckPhase
python3.10-torch> Sourcing python-namespaces-hook
python3.10-torch> Sourcing python-catch-conflicts-hook.sh
python3.10-torch> unpacking sources
python3.10-torch> Executing wheelUnpackPhase
python3.10-torch> Finished executing wheelUnpackPhase
python3.10-torch> patching sources
python3.10-torch> configuring
python3.10-torch> no configure script, doing nothing
python3.10-torch> building
python3.10-torch> no Makefile or custom buildPhase, doing nothing
python3.10-torch> installing
python3.10-torch> Executing pipInstallPhase
python3.10-torch> /build/dist /build
python3.10-torch> Processing ./torch-2.0.0-cp310-cp310-linux_x86_64.whl
python3.10-torch> Requirement already satisfied: typing-extensions in /nix/store/bndw06wps3i7xpqdk6ryq6wiqg11ggy8-python3.10-typing-extensions-4.5.0/lib/python3.10/site-packages (from torch==2.0.0) (4.5.0)
python3.10-torch> ERROR: Could not find a version that satisfies the requirement sympy (from torch) (from versions: none)
python3.10-torch> ERROR: No matching distribution found for sympy
python3.10-torch>
error: builder for '/nix/store/wrbw0hl6p10cc0x230j8z7f655yal0x4-python3.10-torch-2.0.0.drv' failed with exit code 1 |
|
Ah shoot, so while Triton may be bundled with their distribution when installing using pip, because Nixpkgs' python builder prevents that, we need to make sure we install it. @junjihashimoto you'll probably have to add a After that's added you'll need to update the Seems like with the following it at least builds torch and torchvision successfully. Haven't tried any tests though. (final: prev: {
python3Packages = prev.python3Packages.overrideScope (pfinal: pprev: {
triton-bin = pprev.buildPythonPackage {
version = "2.0.0";
pname = "triton";
format = "wheel";
dontStrip = true;
pythonRemoveDeps = [ "cmake" "torch" ];
nativeBuildInputs = [
prev.lit
pprev.pythonRelaxDepsHook
];
propagatedBuildInputs = [
pprev.filelock
];
src = prev.fetchurl {
name = "triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl";
url = "https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl";
hash = "sha256-OIBu6WY/Sw981keQ6WxXk3QInlj0mqxKZggSGqVeJQU=";
};
};
torch-bin = pprev.torch-bin.overrideAttrs (oldAttrs: {
nativeBuildInputs = oldAttrs.nativeBuildInputs ++ [
prev.lit
];
propagatedBuildInputs = oldAttrs.propagatedBuildInputs ++ [
pprev.sympy
pprev.jinja2
pprev.networkx
pprev.filelock
pfinal.triton-bin
];
});
torchvision-bin = pprev.torchvision-bin.overrideAttrs (oldAttrs: {
nativeBuildInputs = oldAttrs.nativeBuildInputs ++ [
prev.lit
];
});
});
}) |
64e70ab to
40e5ffd
Compare
|
@junjihashimoto those commits helped! Trying to use Try adding chmod +x $out/lib/<whatever python version>/site-packages/triton/third_party/cuda/bin/ptxasto whatever phase feels appropriate in the triton-bin derivation. After I did that, I was able to use |
ca58698 to
7d06d9d
Compare
|
@breakds @ConnorBaker |
I gave this a shot along with #222273 Triton appeared to build but when I ran the sample test above, resulted in this. When trying to import triton: |
|
@katanallama I created the environment as follows. Why is it linked to a different file? |
|
|
|
|
7d06d9d to
944c56e
Compare
| url = "https://download.pytorch.org/whl/cu117/torch-1.13.1%2Bcu117-cp38-cp38-linux_x86_64.whl"; | ||
| hash = "sha256-u/lUbw0Ni1EmPKR5Y3tCaogzX8oANPQs7GPU0y3uBa8="; | ||
| name = "torch-2.0.0-cp38-cp38-linux_x86_64.whl"; | ||
| url = "https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp38-cp38-linux_x86_64.whl"; |
There was a problem hiding this comment.
Had another question. It seems to me that the binaries here would only work when torch is used with CUDA 11.8, is that right?
I tried to build it with cuda 11.7 and it can build and run
>>> import torch
>>> torch.version.cuda
'11.7'
Can this be a potential problem, if the original torch binary is built from CUDA 11.8?
Thanks!
There was a problem hiding this comment.
@breakds There are two versions of CUDA. One is a cudatoolkit version. Another is a driver version.
The driver supports multiple versions of cudatoolkit.
And a new GPU like H100 needs a new cudatoolkit to support a new instruction set that is called as compute capability or ptx.
torch-2.0.0+cu118-xxx.whl means that the torch-2.0.0 binary is built by cudatoolkit-11.8.
It is important that the new GPU of H100 is supported by a cudatoolkit of 11.8 or higher version.
https://docs.nvidia.com/deploy/cuda-compatibility/index.html

We can just use a latest nvidia driver supporting CUDA-12 or a driver supporting the binary.
For example, torch-1.13.1+cu117 works with A100 and CUDA-12 driver, but torch-1.13.1+cu117 does not work with H100 and CUDA-12 driver.
There was a problem hiding this comment.
Thanks for the explanation, Junji!
944c56e to
0782be2
Compare
There was a problem hiding this comment.
Changed the license of torch-bin to unfreeRedistributable.
There was a problem hiding this comment.
I'm very much in support of this change! However, this definitely should go in a separate git commit. Maybe even in a separate pull request for the affected people to land on and leave comments
There was a problem hiding this comment.
@SomeoneSerge
I've created a sparate git commit to update the license to unfreeRedistributable.
There was a problem hiding this comment.
The new comment overlaps with the old one, we need to merge them. Thoughts:
- [👍🏻] You already explain that Pytorch is redistributed under BSD3
- [✅] Keep the links to CUDA EULA and Intel Open Source License
- When mentioning Intel Open Source License, also include the link to the short identifier: https://spdx.org/licenses/Intel.html, (kudos @fabaff)
- Add a link to
lib.licenses.isslhttps://www.intel.com/content/dam/develop/external/us/en/documents/pdf/intel-simplified-software-license.pdf - Explain that Intel's oneAPI and mkl-dnn are free ASL20
- Explain that upstream wheels link Pytorch statically against MKL (which are
lib.licenses.isslunfree and redistributable) - Explain that upstream wheels include cuda applications, subject to CUDA EULA
- Explain that since the whole thing is distributed as a single package, we have to mark the final derivation as
unfreeRedistributable
I actually didn't notice any components that refer to the Intel Opensource License, but we should keep the links for reference
0782be2 to
937bc1c
Compare
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-46/26872/1 |
| description = "A language and compiler for custom Deep Learning operations"; | ||
| homepage = "https://github.com/openai/triton/"; | ||
| changelog = "https://github.com/openai/triton/releases/tag/v${version}"; | ||
| license = licenses.mit; |
There was a problem hiding this comment.
Technically, it includes a copy of NVIDIA's ptxas
| pushd $out/${python.sitePackages}/torch/lib | ||
| LIBNVRTC=`ls libnvrtc-* |grep -v libnvrtc-builtins` | ||
| if [ ! -z "$LIBNVRTC" ] ; then | ||
| ln -s "$LIBNVRTC" libnvrtc.so |
There was a problem hiding this comment.
I have a feeling that using the nix-packaged libnvrtc ("${cudaPackages.cuda_nvrtc}/lib/libnvrtc.so") should be less fragile. At the very list, we have control over it
There was a problem hiding this comment.
Disclaimer: there's an open issue about libnvrtc.so locating libnvrtc-builtins.so, #225240
| postFixup = let | ||
| rpath = lib.makeLibraryPath [ stdenv.cc.cc.lib ]; | ||
| in '' | ||
| find $out/${python.sitePackages}/torchaudio/lib -type f \( -name '*.so' -or -name '*.so.*' \) | while read lib; do |
There was a problem hiding this comment.
We have autoPatchelfHook for exactly this kind of logic. It'll add libraries from buildInputs and things like $out/lib into runpaths of libraries and executables, depending on what they declare as DT_NEEDED
There was a problem hiding this comment.
Just an update: we have just merged a source-build of triton into a different location. It happens to be pkgs/development/python-modules/openai-triton/default.nix.I think moving this all over to openai-triton/ is the easiest way to go from here
Btw, I think we should keep a -bin version of triton as well, because there are differences between that and our source-build. For one thing, upstream has already moved to llvm17, and we're only using llvm15 at the moment. So it's good to have both
|
|
||
| trove-classifiers = callPackage ../development/python-modules/trove-classifiers { }; | ||
|
|
||
| triton-bin = callPackage ../development/python-modules/triton/bin.nix { }; |
There was a problem hiding this comment.
Similarly, I suggest we rename this into openai-triton-bin
| , jinja2 | ||
| , networkx | ||
| , filelock | ||
| , triton-bin |
There was a problem hiding this comment.
H'm... it seems that other derivations (e.g. torchvision-bin) do it this way as well. Nonetheless, I'd rather declare the formal parameter with the source-build's name. I.e. I'd take openai-triton, but pass openai-triton-bin in the callPackage. This way the overrides to torch and torch-bin look exactly the same and I don't have to guess which naming scheme to use
Feel free to ignore this comment though
| patchelf --add-needed ${zlib.out}/lib/libz.so \ | ||
| "$out/${python.sitePackages}/triton/_C/libtriton.so" | ||
| ''; | ||
|
|
There was a problem hiding this comment.
pythonImportsCheck (or does it still fail because of the circular dependency?)

Description of changes
Pytorch 2.0 is released.
Things done
sandbox = trueset innix.conf? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)