python3Packages.torch: allow lazy loading libnvrtc#297590
python3Packages.torch: allow lazy loading libnvrtc#297590SomeoneSerge wants to merge 5 commits intoNixOS:masterfrom
Conversation
|
I tried this against my setup using torch-bin and the issue persists; do the fixes in torch/default.nix need to be applied to bin.nix? |
There was a problem hiding this comment.
I tried this against my setup using torch-bin and the issue persists; do the fixes in torch/default.nix need to be applied to bin.nix?
I didn't look at torch-bin although it's only natural it should be broken the same way.
I also noticed a typo in the setup-hook because of which the libnvrtc path got lost in torch too.
| newPath="${newPath}${elfAddRunpathSuffix:+:}${elfAddRunpathSuffix}" | |
| newPath="${newPath}${elfAddRunpathsSuffix:+:}${elfAddRunpathsSuffix}" |
pkgs/development/cuda-modules/setup-hooks/auto-fix-elf-files.sh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
If I understood correctly, instead of patching elf files for each path, we accumulate all the paths in an array and patch everything at once in the end, letting us change the order.
I really like this approach. We can only specify "insert before" currently, which isn't a complete way to control the path ordering, but I guess it's amply sufficient in practice.
There was a problem hiding this comment.
if autoFixElfFiles is only used there, I wonder if it's not simpler to inline it. As much as I liked my higher-order solution 😛 (or, another solution if we want to keep the code nicely modular in those different bash functions, would be to hardcode elfAddRunpathsAction instead of fixAction). Unless you envision autoFixElfFiles to be useful for other things?
pkgs/development/cuda-modules/setup-hooks/auto-fix-elf-files.sh
Outdated
Show resolved
Hide resolved
python3Packages.torch-bin: fix lazy nvrtc
9294ff8 to
5ca0617
Compare
5ca0617 to
026df8f
Compare
There was a problem hiding this comment.
omg, I recovered the string-splitting in the last force-push, but I broke cuda_compat again because I do the string-splitting too late 🙈
|
Wondering is this PR mainly pending resolving the conflicts? I see the comments from #296179 about jetson being affected - is that still a concern? Thanks! |
Pretty much, the PR stalled after the hooks were moved from EDIT: I'm also getting more skeptical about the bash approach, maybe we should've gone with e.g. |
|
@SomeoneSerge Thanks for the clarification!
Is this related to the following warnings that I saw recently? trace: warning: cudaPackages.autoAddDriverRunpath is deprecated, use pkgs.autoAddDriverRunpath instead
trace: warning: cudaPackages.autoAddDriverRunpath is deprecated, use pkgs.autoAddDriverRunpath instead
trace: warning: cudaPackages.autoFixElfFiles is deprecated, use pkgs.autoFixElfFiles instead
trace: warning: cudaPackages.autoAddOpenGLRunpathHook is deprecated, use pkgs.autoAddDriverRunpath instead
I saw your fix mentioned in another thread which seems like a good temporary solution. Is it not acceptable to patch it like that at this moment?
I also find that bash scripts are not the easiest to read and to maintain ... but it is probably my problem :( |
Yes and you're welcome to go ahead and implement it! |
|
Would these be relevant to tensorflow? |
Does it dlopen libnvrtc? |
| arrayInsertBefore() { | ||
| local -n arrayRef="$1" # Namerefs, bash >= 4.3: | ||
| local pattern="$2" | ||
| local item="$3" | ||
| shift 3 | ||
|
|
||
| if [[ $# -eq 0 ]]; then | ||
| echo "addCudaCompatRunpath: no library path provided" >&2 | ||
| exit 1 | ||
| elif [[ $# -gt 1 ]]; then | ||
| echo "addCudaCompatRunpath: too many arguments" >&2 | ||
| exit 1 | ||
| elif [[ "$1" == "" ]]; then | ||
| echo "addCudaCompatRunpath: empty library path" >&2 | ||
| exit 1 | ||
| else | ||
| libPath="$1" | ||
| fi | ||
| local i | ||
| local foundMatch= | ||
|
|
||
| origRpath="$(patchelf --print-rpath "$libPath")" | ||
| patchelf --set-rpath "@libcudaPath@:$origRpath" "$libPath" | ||
| local -a newArray | ||
| for i in "${arrayRef[@]}" ; do | ||
| if [[ "$i" == "$pattern" ]] ; then | ||
| newArray+=( "$item" ) | ||
| foundMatch=1 | ||
| fi | ||
| newArray+=( "$i" ) | ||
| done | ||
| if [[ -z "$foundMatch" ]] ; then | ||
| newArray+=( "$item" ) | ||
| fi | ||
| arrayRef=( "${newArray[@]}" ) | ||
| } | ||
|
|
||
| postFixupHooks+=("autoFixElfFiles addCudaCompatRunpath") | ||
|
|
||
| if [[ -n "@libcudaPath@" ]] ; then | ||
| arrayInsertBefore elfPrependRunpaths "@driverLink@/lib" "@libcudaPath@" | ||
| fi |
|
Stale for a long time, need to revisit the motivation and the situation, etc, etc. EDIT: Maybe closed prematurely? #461334 |
Description of changes
Fixes #296179, refactors
autoAddDriverRunapthandautoAddCudaCompatHookCC @yannham
Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.