python3Packages.torch-bin: 1.13.1 -> 2.0.0 by junjihashimoto · Pull Request #221652 · NixOS/nixpkgs

junjihashimoto · 2023-03-17T13:06:45Z

Description of changes

Pytorch 2.0 is released.

Things done

breakds · 2023-03-21T14:18:06Z

Out of curiosity, I would like to consult a few questions. It seems that torch.compile in Pytorch 2.0 requires triton. Currently triton is not in Nixpkgs yet.

Does torch.compile work with this PR?
If so, how does it achieve that? My guess is that the wheel binary will have compiled triton shared libraries in it, but what about the triton python package that wraps it?

Thanks a lot!

ConnorBaker · 2023-03-21T20:07:49Z

@breakds IIRC PyTorch distributes their own builds of Triton from the branch OpenAI maintains for (with?) them: https://github.com/openai/triton/tree/torch-inductor-stable.

I'll try it out later to see if it works.

To be clear, I personally don't think a broken torch.compile should prevent this from getting merged: from what I remember from trying to package Triton a few months ago, it needs MLIR and a few other things we don't have, as well as some manual patches.

breakds · 2023-03-21T20:48:00Z

@breakds IIRC PyTorch distributes their own builds of Triton from the branch OpenAI maintains for (with?) them: https://github.com/openai/triton/tree/torch-inductor-stable.

I'll try it out later to see if it works.

To be clear, I personally don't think a broken torch.compile should prevent this from getting merged: from what I remember from trying to package Triton a few months ago, it needs MLIR and a few other things we don't have, as well as some manual patches.

Thanks a lot for the explanation, especially the information about the torch-inductor-stable branch. That makes sense to me.

Agree that a broken torch.compile shouldn't be a blocker, thought it is an important feature of torch and would be nice to have. I tried to package triton and hit the problem that llvm.mlir not in Nixpkgs yet. I assume #163878 is an ongoing effort to address that.

ConnorBaker · 2023-03-21T21:05:37Z

@junjihashimoto I was unable to build the derivation because sympy wasn't added to its inputs. I tried to package the nightlies a month or two back and IIRC we can drop a few inputs and we need to add a few new ones (including sympy).

$ nix build --impure -L github:NixOS/nixpkgs/refs/pull/221652/head#python3Packages.torch-bin
python3.10-torch> Sourcing python-remove-tests-dir-hook
python3.10-torch> Sourcing python-catch-conflicts-hook.sh
python3.10-torch> Sourcing python-remove-bin-bytecode-hook.sh
python3.10-torch> Sourcing wheel setup hook
python3.10-torch> Using wheelUnpackPhase
python3.10-torch> Sourcing pip-install-hook
python3.10-torch> Using pipInstallPhase
python3.10-torch> Sourcing python-imports-check-hook.sh
python3.10-torch> Using pythonImportsCheckPhase
python3.10-torch> Sourcing python-namespaces-hook
python3.10-torch> Sourcing python-catch-conflicts-hook.sh
python3.10-torch> unpacking sources
python3.10-torch> Executing wheelUnpackPhase
python3.10-torch> Finished executing wheelUnpackPhase
python3.10-torch> patching sources
python3.10-torch> configuring
python3.10-torch> no configure script, doing nothing
python3.10-torch> building
python3.10-torch> no Makefile or custom buildPhase, doing nothing
python3.10-torch> installing
python3.10-torch> Executing pipInstallPhase
python3.10-torch> /build/dist /build
python3.10-torch> Processing ./torch-2.0.0-cp310-cp310-linux_x86_64.whl
python3.10-torch> Requirement already satisfied: typing-extensions in /nix/store/bndw06wps3i7xpqdk6ryq6wiqg11ggy8-python3.10-typing-extensions-4.5.0/lib/python3.10/site-packages (from torch==2.0.0) (4.5.0)
python3.10-torch> ERROR: Could not find a version that satisfies the requirement sympy (from torch) (from versions: none)
python3.10-torch> ERROR: No matching distribution found for sympy
python3.10-torch> 
error: builder for '/nix/store/wrbw0hl6p10cc0x230j8z7f655yal0x4-python3.10-torch-2.0.0.drv' failed with exit code 1

ConnorBaker · 2023-03-21T21:20:23Z

Ah shoot, so while Triton may be bundled with their distribution when installing using pip, because Nixpkgs' python builder prevents that, we need to make sure we install it.

@junjihashimoto you'll probably have to add a triton-bin package: https://download.pytorch.org/whl/triton/.

After that's added you'll need to update the torch-bin derivation with it.

Seems like with the following it at least builds torch and torchvision successfully. Haven't tried any tests though.

(final: prev: {
  python3Packages = prev.python3Packages.overrideScope (pfinal: pprev: {
    triton-bin = pprev.buildPythonPackage {
      version = "2.0.0";
      pname = "triton";
      format = "wheel";
      dontStrip = true;
      pythonRemoveDeps = [ "cmake" "torch" ];
      nativeBuildInputs = [
        prev.lit
        pprev.pythonRelaxDepsHook
      ];
      propagatedBuildInputs = [
        pprev.filelock
      ];
      src = prev.fetchurl {
        name = "triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl";
        url = "https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl";
        hash = "sha256-OIBu6WY/Sw981keQ6WxXk3QInlj0mqxKZggSGqVeJQU=";
      };
    };
    torch-bin = pprev.torch-bin.overrideAttrs (oldAttrs: {
      nativeBuildInputs = oldAttrs.nativeBuildInputs ++ [
        prev.lit
      ];
      propagatedBuildInputs = oldAttrs.propagatedBuildInputs ++ [
        pprev.sympy
        pprev.jinja2
        pprev.networkx
        pprev.filelock
        pfinal.triton-bin
      ];
    });
    torchvision-bin = pprev.torchvision-bin.overrideAttrs (oldAttrs: {
      nativeBuildInputs = oldAttrs.nativeBuildInputs ++ [
        prev.lit
      ];
    });
  });
})

ConnorBaker · 2023-03-22T01:29:29Z

@junjihashimoto those commits helped!

Trying to use torch.compile, I can see that I'm getting permissions errors because it's not executable.

Try adding

chmod +x $out/lib/<whatever python version>/site-packages/triton/third_party/cuda/bin/ptxas

to whatever phase feels appropriate in the triton-bin derivation.

After I did that, I was able to use torch.compile! 🎉

junjihashimoto · 2023-03-22T10:25:57Z

@breakds @ConnorBaker
Thank you for your help!
I tested this sample. It works.
https://gist.github.com/junjihashimoto/76f73f118289a5c33f4e311b66a7677a

katanallama · 2023-03-22T17:06:27Z

@breakds @ConnorBaker Thank you for your help! I tested this sample. It works. https://gist.github.com/junjihashimoto/76f73f118289a5c33f4e311b66a7677a

I gave this a shot along with #222273

Triton appeared to build but when I ran the sample test above, resulted in this.

When trying to import triton:
ImportError: /nix/store/yiflcg7zmirny3654g8l8f85sz958gqk-gcc-11.3.0-lib/lib64/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /nix/store/m2i94jljh4bdjk06rqvfmzpxm3nwk2nm-python3-3.10.10-env/lib/python3.10/site-packages/triton/_C/libtriton.so)

junjihashimoto · 2023-03-22T17:51:57Z

@katanallama
In my case, libtriton.so is linked to /nix/store/wvm2hvqdbbsp1f11463mrw8nyv678ipm-gcc-12.2.0-lib/lib/libstdc++.so.6 .

I created the environment as follows.

nix-shell -p python3 'let pkgs=import ./default.nix {}; in pkgs.python3.withPackages (p: [p.torchvision-bin p.torch-bin])'

Why is it linked to a different file?

junjihashimoto · 2023-03-22T17:52:30Z

~~The link of libz.so.1 should be fixed.~~ Fixed.

junjihashimoto · 2023-03-22T17:55:54Z

~~ptxas links to /lib64/ld-linux-x86-64.so.2, so I'll fix it.~~ Fixed.

breakds · 2023-03-23T15:54:08Z

pkgs/development/python-modules/torch/binary-hashes.nix

-      url = "https://download.pytorch.org/whl/cu117/torch-1.13.1%2Bcu117-cp38-cp38-linux_x86_64.whl";
-      hash = "sha256-u/lUbw0Ni1EmPKR5Y3tCaogzX8oANPQs7GPU0y3uBa8=";
+      name = "torch-2.0.0-cp38-cp38-linux_x86_64.whl";
+      url = "https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp38-cp38-linux_x86_64.whl";


Had another question. It seems to me that the binaries here would only work when torch is used with CUDA 11.8, is that right?

I tried to build it with cuda 11.7 and it can build and run

>>> import torch >>> torch.version.cuda '11.7'

Can this be a potential problem, if the original torch binary is built from CUDA 11.8?

Thanks!

@breakds There are two versions of CUDA. One is a cudatoolkit version. Another is a driver version.
The driver supports multiple versions of cudatoolkit.
And a new GPU like H100 needs a new cudatoolkit to support a new instruction set that is called as compute capability or ptx.

torch-2.0.0+cu118-xxx.whl means that the torch-2.0.0 binary is built by cudatoolkit-11.8.
It is important that the new GPU of H100 is supported by a cudatoolkit of 11.8 or higher version.
https://docs.nvidia.com/deploy/cuda-compatibility/index.html

We can just use a latest nvidia driver supporting CUDA-12 or a driver supporting the binary.

For example, torch-1.13.1+cu117 works with A100 and CUDA-12 driver, but torch-1.13.1+cu117 does not work with H100 and CUDA-12 driver.

Thanks for the explanation, Junji!

junjihashimoto · 2023-03-25T04:51:01Z

pkgs/development/python-modules/torch/bin.nix

Changed the license of torch-bin to unfreeRedistributable.

I'm very much in support of this change! However, this definitely should go in a separate git commit. Maybe even in a separate pull request for the affected people to land on and leave comments

@SomeoneSerge
I've created a sparate git commit to update the license to unfreeRedistributable.

The new comment overlaps with the old one, we need to merge them. Thoughts:

[👍🏻] You already explain that Pytorch is redistributed under BSD3

[✅] Keep the links to CUDA EULA and Intel Open Source License

When mentioning Intel Open Source License, also include the link to the short identifier: https://spdx.org/licenses/Intel.html, (kudos @fabaff)

Add a link to lib.licenses.issl https://www.intel.com/content/dam/develop/external/us/en/documents/pdf/intel-simplified-software-license.pdf

Explain that Intel's oneAPI and mkl-dnn are free ASL20

Explain that upstream wheels link Pytorch statically against MKL (which are lib.licenses.issl unfree and redistributable)

Explain that upstream wheels include cuda applications, subject to CUDA EULA

Explain that since the whole thing is distributed as a single package, we have to mark the final derivation as unfreeRedistributable

I actually didn't notice any components that refer to the Intel Opensource License, but we should keep the links for reference

nixos-discourse · 2023-03-31T14:38:05Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-46/26872/1

SomeoneSerge · 2023-04-11T20:36:21Z

pkgs/development/python-modules/triton/bin.nix

+    description = "A language and compiler for custom Deep Learning operations";
+    homepage = "https://github.com/openai/triton/";
+    changelog = "https://github.com/openai/triton/releases/tag/v${version}";
+    license = licenses.mit;


Technically, it includes a copy of NVIDIA's ptxas

SomeoneSerge · 2023-04-11T20:39:09Z

pkgs/development/python-modules/torch/bin.nix

+    pushd $out/${python.sitePackages}/torch/lib
+      LIBNVRTC=`ls libnvrtc-* |grep -v libnvrtc-builtins`
+      if [ ! -z "$LIBNVRTC" ] ; then
+        ln -s "$LIBNVRTC" libnvrtc.so


I have a feeling that using the nix-packaged libnvrtc ("${cudaPackages.cuda_nvrtc}/lib/libnvrtc.so") should be less fragile. At the very list, we have control over it

Disclaimer: there's an open issue about libnvrtc.so locating libnvrtc-builtins.so, #225240

SomeoneSerge · 2023-04-11T20:41:48Z

pkgs/development/python-modules/torchaudio/bin.nix

+  postFixup = let
+    rpath = lib.makeLibraryPath [ stdenv.cc.cc.lib ];
+  in ''
+    find $out/${python.sitePackages}/torchaudio/lib -type f \( -name '*.so' -or -name '*.so.*' \) | while read lib; do


We have autoPatchelfHook for exactly this kind of logic. It'll add libraries from buildInputs and things like $out/lib into runpaths of libraries and executables, depending on what they declare as DT_NEEDED

SomeoneSerge · 2023-04-11T20:46:09Z

pkgs/development/python-modules/triton/bin.nix

Just an update: we have just merged a source-build of triton into a different location. It happens to be pkgs/development/python-modules/openai-triton/default.nix.I think moving this all over to openai-triton/ is the easiest way to go from here

Btw, I think we should keep a -bin version of triton as well, because there are differences between that and our source-build. For one thing, upstream has already moved to llvm17, and we're only using llvm15 at the moment. So it's good to have both

SomeoneSerge · 2023-04-11T20:46:38Z

pkgs/top-level/python-packages.nix


  trove-classifiers = callPackage ../development/python-modules/trove-classifiers { };

+  triton-bin = callPackage ../development/python-modules/triton/bin.nix { };


Similarly, I suggest we rename this into openai-triton-bin

SomeoneSerge · 2023-04-11T20:53:57Z

pkgs/development/python-modules/torch/bin.nix

+, jinja2
+, networkx
+, filelock
+, triton-bin


H'm... it seems that other derivations (e.g. torchvision-bin) do it this way as well. Nonetheless, I'd rather declare the formal parameter with the source-build's name. I.e. I'd take openai-triton, but pass openai-triton-bin in the callPackage. This way the overrides to torch and torch-bin look exactly the same and I don't have to guess which naming scheme to use

Feel free to ignore this comment though

SomeoneSerge · 2023-04-11T21:44:08Z

pkgs/development/python-modules/triton/bin.nix

+    patchelf --add-needed ${zlib.out}/lib/libz.so \
+      "$out/${python.sitePackages}/triton/_C/libtriton.so"
+  '';
+


pythonImportsCheck (or does it still fail because of the circular dependency?)

github-actions bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Mar 17, 2023

junjihashimoto changed the title ~~python3Packages.torch, torch-bin: 1.13.1 -> 2.0.0~~ python3Packages.torch-bin: 1.13.1 -> 2.0.0 Mar 17, 2023

junjihashimoto mentioned this pull request Mar 17, 2023

Update request: python3Packages.torch 1.13.1 → 2.0.0 #221545

Closed

SomeoneSerge added the 6.topic: cuda Parallel computing platform and API label Mar 20, 2023

junjihashimoto force-pushed the feat/pytorch2 branch 2 times, most recently from 64e70ab to 40e5ffd Compare March 22, 2023 00:01

ofborg bot added 8.has: clean-up This PR removes packages or removes other cruft 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. and removed 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. labels Mar 22, 2023

ConnorBaker self-assigned this Mar 22, 2023

ConnorBaker mentioned this pull request Mar 22, 2023

python3Packages.torch: 1.13.1 -> 2.0.0 #222273

Merged

12 tasks

junjihashimoto force-pushed the feat/pytorch2 branch 3 times, most recently from ca58698 to 7d06d9d Compare March 22, 2023 10:21

python3Packages.triton-bin: init at 2.0.0

6d77720

junjihashimoto force-pushed the feat/pytorch2 branch from 7d06d9d to 944c56e Compare March 22, 2023 18:24

breakds reviewed Mar 23, 2023

View reviewed changes

junjihashimoto force-pushed the feat/pytorch2 branch from 944c56e to 0782be2 Compare March 25, 2023 04:50

junjihashimoto commented Mar 25, 2023

View reviewed changes

junjihashimoto added 4 commits March 28, 2023 14:24

python3Packages.torch-bin: 1.13.1 -> 2.0.0

8ca0feb

python3Packages.torch-bin: update license to unfreeRedistributable

b73d050

python3Packages.torchvision-bin: 0.14.1 -> 0.15.1

6df047f

python3Packages.torchaudio-bin: 0.13.1 -> 2.0.1

937bc1c

junjihashimoto force-pushed the feat/pytorch2 branch from 0782be2 to 937bc1c Compare March 28, 2023 05:26

SomeoneSerge mentioned this pull request Apr 1, 2023

python3Packages.torch-bin: misleading license #222468

Closed

GaetanLepage requested a review from SomeoneSerge April 11, 2023 20:11

SomeoneSerge reviewed Apr 11, 2023

View reviewed changes

GaetanLepage mentioned this pull request Apr 21, 2023

python3Packages.torch-bin: 1.13.1 -> 2.0.0 #227420

Merged

12 tasks

SomeoneSerge mentioned this pull request Apr 23, 2023

python3Packages.torch-bin: cudaPackages vs vendored cuda libraries #227748

Open

1 task

junjihashimoto closed this May 1, 2023


		trove-classifiers = callPackage ../development/python-modules/trove-classifiers { };

		triton-bin = callPackage ../development/python-modules/triton/bin.nix { };

Uh oh!

Conversation

junjihashimoto commented Mar 17, 2023

Description of changes

Things done

Uh oh!

breakds commented Mar 21, 2023

Uh oh!

ConnorBaker commented Mar 21, 2023

Uh oh!

breakds commented Mar 21, 2023

Uh oh!

ConnorBaker commented Mar 21, 2023

Uh oh!

ConnorBaker commented Mar 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ConnorBaker commented Mar 22, 2023

Uh oh!

junjihashimoto commented Mar 22, 2023

Uh oh!

katanallama commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junjihashimoto commented Mar 22, 2023

Uh oh!

junjihashimoto commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junjihashimoto commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

breakds Mar 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junjihashimoto Mar 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nixos-discourse commented Mar 31, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SomeoneSerge Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ConnorBaker commented Mar 21, 2023 •

edited

Loading

katanallama commented Mar 22, 2023 •

edited

Loading

junjihashimoto commented Mar 22, 2023 •

edited

Loading

junjihashimoto commented Mar 22, 2023 •

edited

Loading

breakds Mar 23, 2023 •

edited

Loading

junjihashimoto Mar 24, 2023 •

edited

Loading

SomeoneSerge Apr 11, 2023 •

edited

Loading