Skip to content

python3Packages.nvidia-ml-py: init 11.515.48#189028

Merged
SuperSandro2000 merged 2 commits intoNixOS:masterfrom
GaetanLepage:nvidia-ml-py
Oct 18, 2022
Merged

python3Packages.nvidia-ml-py: init 11.515.48#189028
SuperSandro2000 merged 2 commits intoNixOS:masterfrom
GaetanLepage:nvidia-ml-py

Conversation

@GaetanLepage
Copy link
Contributor

@GaetanLepage GaetanLepage commented Aug 30, 2022

Description of changes

Adding the nvidia-ml-py bindings for the NVIDIA Management Library.
This is a requirement for nvitop which I would like to package next.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Aug 30, 2022
@ofborg ofborg bot added 8.has: package (new) This PR adds a new package 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Aug 30, 2022
@GaetanLepage GaetanLepage changed the title nvidia-ml-py: init 11.515.48 python3Packages.nvidia-ml-py: init 11.515.48 Aug 31, 2022
@GaetanLepage
Copy link
Contributor Author

Hello @FRidh and @jonringer !

I think this PR should be ready.
Please, whenever you have the time, could you have a look at it ?
I am of course willing to address any necessary modification.

Thank you very much in advance :)

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/need-pr-review-nvidia-ml-py/22020/1

@FRidh
Copy link
Member

FRidh commented Sep 26, 2022

Not quite.

nvmlLib = CDLL("libnvidia-ml.so.1")

needs to hardcode the absolute path to the library.


Nevermind, this is provided by the NVIDIA driver. Still, probably best to hardcode a path to the stub in the cuda toolkit. cc @SomeoneSerge

@FRidh FRidh added the 6.topic: cuda Parallel computing platform and API label Sep 26, 2022
@SomeoneSerge
Copy link
Contributor

RE: ctypes.CDLL("libnvidia-ml.so.1")

I think we want this line to resolve into /run/opengl-driver/lib/libnvidia-ml.so

@SomeoneSerge
Copy link
Contributor

I guess we can preserve LD_LIBRARY_PATH behaviour if we try paths in this order:

''
try:
    nvmlLib = CDLL("libnvidia-ml.so.1")
except OSError:
    nvmlLib = CDLL("${addOpenGLRunpath.driverLink}")
''

Idk if it affects nixGL in any way

@GaetanLepage
Copy link
Contributor Author

@SomeoneSerge is your suggestion supposed to be a patch ?
I don't really see what I should change...
Thank you anyway for your help !

@SomeoneSerge
Copy link
Contributor

@GaetanLepage

There's the line 1679 in pynvml.py that @ FRidh mentioned:

                         nvmlLib = CDLL("libnvidia-ml.so.1")

If you run this on NixOS it's going to fail with OSError, because the directory with libnvidia-ml.so.1 won't be in the search path list. I suggest we catch this error and re-try with this library's absolute path. This library seems to be driver-specific (like libcuda.so), so instead of a path in /nix/store we probably should use a path in /run/opengl-driver/lib, deployed by NixOS at runtime

Copy link
Contributor

@jonringer jonringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as others have mentioned, if this is exposing the functionality of nvidia drivers, you need to have the CDLL call resolve to nixos driver path /run/opengl-driver/lib/<lib>

@SuperSandro2000
Copy link
Member

Missing tests or doCheck = false and pythonImportsCheck.

@GaetanLepage GaetanLepage force-pushed the nvidia-ml-py branch 3 times, most recently from 949f782 to 3cfac44 Compare September 29, 2022 07:10
@FRidh
Copy link
Member

FRidh commented Sep 29, 2022

Could we not fix this in the stub and have all packages patch the path to the stub instead?

@GaetanLepage
Copy link
Contributor Author

GaetanLepage commented Sep 29, 2022

Am I then supposed to use substituteInPlace to replace the string libnvidia-ml.so.1 by /run/opengl-driver/lib/libnvidia-ml.so.1 in pynvml.py or is there something smarter to do ?

@SomeoneSerge
Copy link
Contributor

Could we not fix this in the stub and have all packages patch the path to the stub instead?

I'm not sure I follow, what exactly are you suggesting?
Trying to use e.g. nvidia-smi linked against the stub won't work

substituteInPlace to replace the string libnvidia-ml.so.1 by /run/opengl-driver/lib/libnvidia-ml.so.1

It's crude, but that's what i was thinking of

@GaetanLepage
Copy link
Contributor Author

Could we not fix this in the stub and have all packages patch the path to the stub instead?

I'm not sure I follow, what exactly are you suggesting? Trying to use e.g. nvidia-smi linked against the stub won't work

substituteInPlace to replace the string libnvidia-ml.so.1 by /run/opengl-driver/lib/libnvidia-ml.so.1

It's crude, but that's what i was thinking of

done

@FRidh
Copy link
Member

FRidh commented Sep 29, 2022

I'm not sure I follow, what exactly are you suggesting?
Trying to use e.g. nvidia-smi linked against the stub won't work

Ah okay, I thought that would be possible. Refer from here to the stub, and then in the stub refer to the driver library.

@SomeoneSerge
Copy link
Contributor

✅ Tested the runtime with

# shell.nix
let
  nixpkgs = builtins.getFlake github:GaetanLepage/nixpkgs/nvidia-ml-py;
  pkgs = import nixpkgs { config.cudaSupport = true; config.allowUnfree = true; };
  py = pkgs.python3.withPackages (ps: with ps; [
    nvidia-ml-py
  ]);
in
with pkgs;
mkShell {
  packages = [ py ];
}

and

# test.py
import pynvml

if __name__ == "__main__":
    pynvml.nvmlInit()

    print(pynvml.nvmlSystemGetDriverVersion())

    n_devices = pynvml.nvmlDeviceGetCount()
    devices = [
        pynvml.nvmlDeviceGetName(pynvml.nvmlDeviceGetHandleByIndex(i))
        for i in range(n_devices)
    ]
    print(f"{n_devices} devices:")
    print("\n".join(devices))

@GaetanLepage
Copy link
Contributor Author

Can the libnvidia-ml.so.1 hard linking as /run/opengl-driver/lib/libnvidia-ml.so.1 pose some problems on non-NixOs systems ?

@FRidh
Copy link
Member

FRidh commented Sep 29, 2022

Can the libnvidia-ml.so.1 hard linking as /run/opengl-driver/lib/libnvidia-ml.so.1 pose some problems on non-NixOs systems ?

Yes, this Nix package with that patch will only be usable on NixOS.

@SomeoneSerge
Copy link
Contributor

Can the libnvidia-ml.so.1 hard linking as /run/opengl-driver/lib/libnvidia-ml.so.1 pose some problems on non-NixOs systems ?

This why I suggested a try-except patch instead: with it non-nixos users can set LD_LIBRARY_PATH=... as a workaround

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Sep 29, 2022

I don't find any of this mandatory, but if it makes sense for you, I meant something like:

diff --git a/pynvml.py b/pynvml.py
index 9e094c6..727716b 100644
--- a/pynvml.py
+++ b/pynvml.py
@@ -1676,7 +1676,11 @@ def _LoadNvmlLibrary():
                             nvmlLib = CDLL(os.path.join(os.getenv("ProgramFiles", "C:/Program Files"), "NVIDIA Corporation/NVSMI/nvml.dll"))
                     else:
                         # assume linux
-                        nvmlLib = CDLL("libnvidia-ml.so.1")
+                        try:
+                            nvmlLib = CDLL("libnvidia-ml.so.1")
+                        except OSError:
+                            # Assume NixOS
+                            nvmlLib = CDLL("/run/opengl-driver/lib/libnvidia-ml.so.1")
                 except OSError as ose:
                     _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
                 if (nvmlLib == None):

I would, maybe, define this patch in Nix (via writeText?), and use the addOpenGLRunpath.driverLink value instead of /run/opengl-driver/... because there's an intent to rename the "opengl-driver" eventually.

Maybe it would be even simpler to substitute /run/opengl-driver/lib with driverLink after the patch phase (the substitute would require a comment explaining that it's there in case of a rename, and the CDLL line in the diff would require a comment that this gets substituted later)

@GaetanLepage
Copy link
Contributor Author

Can the libnvidia-ml.so.1 hard linking as /run/opengl-driver/lib/libnvidia-ml.so.1 pose some problems on non-NixOs systems ?

This why I suggested a try-except patch instead: with it non-nixos users can set LD_LIBRARY_PATH=... as a workaround

Something like this ?

try:
    nvmlLib = CDLL("libnvidia-ml.so.1")
except OSError as ose:
    nvmlLib = CDLL("/run/opengl-driver/lib/libnvidia-ml.so.1")

@GaetanLepage
Copy link
Contributor Author

I would, maybe, define this patch in Nix (via writeText?), and use the addOpenGLRunpath.driverLink value instead of /run/opengl-driver/... because there's an intent to rename the "opengl-driver" eventually.

Maybe it would be even simpler to substitute /run/opengl-driver/lib with driverLink after the patch phase (the substitute would require a comment explaining that it's there in case of a rename, and the CDLL line in the diff would require a comment that this gets substituted later)

I am not sure how to properly do this...
Do I need to create a separate libnvidia-path.patch file and reference it in the derivation ?

@SomeoneSerge
Copy link
Contributor

buildPythonPackage {
    # ...
    patches = [
        ./0001-locate-libnvidia-ml.so.1-on-NixOS.patch
    ];
    # ...
}

or

buildPythonPackage {
    # ...
    patches = [
        (writeText "0001-locate-libnvidia-ml.so.1-on-NixOS.patch" ''
            ...
        '')
    ];
    # ...
}

@GaetanLepage
Copy link
Contributor Author

GaetanLepage commented Sep 30, 2022

Ok, I have adopted the patch strategy.
It should be ok now.

@SomeoneSerge could you please test on your system ?

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Oct 1, 2022

@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Oct 1, 2022

or

buildPythonPackage {
    # ...
    patches = [
        (writeText "0001-locate-libnvidia-ml.so.1-on-NixOS.patch" ''
            ...
        '')
    ];
    # ...
}

Please don't do that. It needlessly creates a derivation.

Last thing to choose is whether we account for the value of addOpenGLRunpath.driverLink right now, or later in #158079

This is a WIP RFC. Please don't adopt to it yet. It might change or not come into place for a few more months.

@bobby285271 bobby285271 added the 12.approvals: 1 This PR was reviewed and approved by one person. label Oct 1, 2022
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. and removed 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. labels Oct 12, 2022
@bobby285271 bobby285271 added 12.approvals: 2 This PR was reviewed and approved by two persons. and removed 12.approvals: 1 This PR was reviewed and approved by one person. labels Oct 12, 2022
@SuperSandro2000
Copy link
Member

@ofborg build python310Packages.nvidia-ml-py

@SuperSandro2000 SuperSandro2000 merged commit d93b595 into NixOS:master Oct 18, 2022
@GaetanLepage GaetanLepage deleted the nvidia-ml-py branch October 18, 2022 20:51
@GaetanLepage GaetanLepage mentioned this pull request Oct 19, 2022
13 tasks
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/python-dll-import/22594/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: cuda Parallel computing platform and API 6.topic: python Python is a high-level, general-purpose programming language. 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 12.approvals: 2 This PR was reviewed and approved by two persons.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants