Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot install through any means #24

Open
Linardos opened this issue Nov 18, 2024 · 12 comments
Open

[BUG] Cannot install through any means #24

Linardos opened this issue Nov 18, 2024 · 12 comments
Labels
documentation Improvements or additions to documentation

Comments

@Linardos
Copy link

I followed the steps to install exactly as described but none of the options work sadly:

Package not in pip nor conda:

(gsynth) locolinux2@IN-OTA-232347:~$ pip install gandlf-synth
ERROR: Could not find a version that satisfies the requirement gandlf-synth (from versions: none)
ERROR: No matching distribution found for gandlf-synth
(gsynth) locolinux2@IN-OTA-232347:~$ conda install -c conda-forge gandlf-synth -y
Channels:
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - gandlf-synth

Current channels:

  - https://conda.anaconda.org/conda-forge
  - defaults

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

But it neithers works through cloning and installing directly:

(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ pip install .
Processing /home/locolinux2/GaNDLF-Synth
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting GANDLF@ git+https://github.com/mlcommons/GandLF.git@master (from gandlf_synth==0.0.1.dev0)
  Cloning https://github.com/mlcommons/GandLF.git (to revision master) to /tmp/pip-install-nkslaka2/gandlf_717cd84cc6b14cd39c48d0d1e6c9a5ec
  Running command git clone --filter=blob:none --quiet https://github.com/mlcommons/GandLF.git /tmp/pip-install-nkslaka2/gandlf_717cd84cc6b14cd39c48d0d1e6c9a5ec
  Resolved https://github.com/mlcommons/GandLF.git to commit a1fb3f49f1ef0b148d9c4b0826e840d38d0bae38
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting black==23.11.0 (from gandlf_synth==0.0.1.dev0)
  Using cached black-23.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (66 kB)
Collecting lightning==2.4.0 (from gandlf_synth==0.0.1.dev0)
  Using cached lightning-2.4.0-py3-none-any.whl.metadata (38 kB)
Collecting monai-generative==0.2.3 (from gandlf_synth==0.0.1.dev0)
  Using cached monai_generative-0.2.3-py3-none-any.whl.metadata (4.6 kB)
Collecting deepspeed==0.15.1 (from gandlf_synth==0.0.1.dev0)
  Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-nkslaka2/deepspeed_584e40eaecaf4247a29a048c4c72290b/setup.py", line 108, in <module>
          cuda_major_ver, cuda_minor_ver = installed_cuda_version()
        File "/tmp/pip-install-nkslaka2/deepspeed_584e40eaecaf4247a29a048c4c72290b/op_builder/builder.py", line 51, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.


@sarthakpati
Copy link
Contributor

Were you able to install PyTorch?

https://docs.mlcommons.org/GaNDLF-Synth/setup/#installation

@Linardos
Copy link
Author

Yes

@sarthakpati
Copy link
Contributor

@szmazurek: can you think of anything for this? I am unable to replicate it on 3 machines (Windows, Ubuntu, Mint). I have put together a small script to get some debugging information from the environment here. Can you think of anything else to add?

@szmazurek
Copy link
Contributor

Yeah, so with pypi I can imagine that, afaik we did not have the package built and uploaded here. Regarding the installation from the source it seems that you are missing Nvidia compiler (nvcc), which is apparently needed by deepspeed dependency. Can you check if nvcc is installed @Linardos? If not, perhaps installation would do the trick. Next thing can be PATH setting, ensure that all Nvidia related binaries are accessible.

@sarthakpati
Copy link
Contributor

sarthakpati commented Nov 19, 2024

If NVCC is needed, perhaps it might make sense to include it in the documentation. I believe installing one of the following (based on the user's system) should be fine:

Thanks for helping us catch this, @Linardos! I am guessing that since all of my (and Szymon's) machines are set up for development, nvcc is automatically found and we don't encounter this.

Relevant issue from DeepSpeed: microsoft/DeepSpeed#2772

EDIT: I also found a cuda-python package on pip but I think that's only for CUDA12.

@sarthakpati sarthakpati added the documentation Improvements or additions to documentation label Nov 19, 2024
@szmazurek
Copy link
Contributor

Yeah, this indeed would be needed - @Linardos if you can confirm that the issue by @sarthakpati #25 will address that.

@Linardos
Copy link
Author

I just installed it through pip, but that doesn't seem to solve it. I have CUDA 12.4 in my machine

(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ pip install nvidia-cuda-nvcc-cu12
Requirement already satisfied: nvidia-cuda-nvcc-cu12 in /home/locolinux2/miniconda3/envs/gsynth/lib/python3.9/site-packages (12.6.77)
(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ pip install .
Processing /home/locolinux2/GaNDLF-Synth
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting GANDLF@ git+https://github.com/mlcommons/GandLF.git@master (from gandlf_synth==0.0.1.dev0)
  Cloning https://github.com/mlcommons/GandLF.git (to revision master) to /tmp/pip-install-kuttpdr9/gandlf_8583bbd08e34436b9794e4167f37ac38
  Running command git clone --filter=blob:none --quiet https://github.com/mlcommons/GandLF.git /tmp/pip-install-kuttpdr9/gandlf_8583bbd08e34436b9794e4167f37ac38
  Resolved https://github.com/mlcommons/GandLF.git to commit 709f6ab59e57782f0b1937b24a1d8a85cd222c42
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting black==23.11.0 (from gandlf_synth==0.0.1.dev0)
  Using cached black-23.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (66 kB)
Collecting lightning==2.4.0 (from gandlf_synth==0.0.1.dev0)
  Using cached lightning-2.4.0-py3-none-any.whl.metadata (38 kB)
Collecting monai-generative==0.2.3 (from gandlf_synth==0.0.1.dev0)
  Using cached monai_generative-0.2.3-py3-none-any.whl.metadata (4.6 kB)
Collecting deepspeed==0.15.1 (from gandlf_synth==0.0.1.dev0)
  Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-kuttpdr9/deepspeed_fd7409a5b2dd41cda27dd8d978d665d2/setup.py", line 108, in <module>
          cuda_major_ver, cuda_minor_ver = installed_cuda_version()
        File "/tmp/pip-install-kuttpdr9/deepspeed_fd7409a5b2dd41cda27dd8d978d665d2/op_builder/builder.py", line 51, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

However, I installed nvcc through sudo apt install nvidia-cuda-toolkit instead and that worked.

(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ pip install .
Processing /home/locolinux2/GaNDLF-Synth
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting GANDLF@ git+https://github.com/mlcommons/GandLF.git@master (from gandlf_synth==0.0.1.dev0)
  Cloning https://github.com/mlcommons/GandLF.git (to revision master) to /tmp/pip-install-1letf4zy/gandlf_9ec66c16bdc542989ba33faa0c893907
  Running command git clone --filter=blob:none --quiet https://github.com/mlcommons/GandLF.git /tmp/pip-install-1letf4zy/gandlf_9ec66c16bdc542989ba33faa0c893907
  Resolved https://github.com/mlcommons/GandLF.git to commit 709f6ab59e57782f0b1937b24a1d8a85cd222c42
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting black==23.11.0 (from gandlf_synth==0.0.1.dev0)
  Using cached black-23.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (66 kB)
Collecting lightning==2.4.0 (from gandlf_synth==0.0.1.dev0)
  Using cached lightning-2.4.0-py3-none-any.whl.metadata (38 kB)
Collecting monai-generative==0.2.3 (from gandlf_synth==0.0.1.dev0)
  Using cached monai_generative-0.2.3-py3-none-any.whl.metadata (4.6 kB)
Collecting deepspeed==0.15.1 (from gandlf_synth==0.0.1.dev0)
  Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
  Preparing metadata (setup.py) ... done
Collecting click>=8.0.0 (from black==23.11.0->gandlf_synth==0.0.1.dev0)
  Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
...

it seems to have been installed successfully.

@sarthakpati
Copy link
Contributor

I think the "solution" of doing sudo install anything is inherently problematic (security issues, and not all folks might have root level access). Is there any way we can check if this would work using conda instead?

@Linardos
Copy link
Author

Linardos commented Nov 20, 2024

This one should work then maybe add that step in the README (I didn't test it but it seems to be the standard steps to do it with conda):

conda install -c nvidia cudatoolkit
Verify your installation with
nvcc --version

@sarthakpati
Copy link
Contributor

Cool. In this case, we need to have an explicit dependency on conda.

@szmazurek
Copy link
Contributor

I do not think that requiring nvcc as the underlying requirement is problematic from the user's perspective, it is basically something you need alongside CUDA drivers for this package. Falling back to conda is one solution, but I would not push it as the only go-to, rather a workaround (also it can be included in the container).

@sarthakpati
Copy link
Contributor

Since it is on the user-level, I think conda should be the primary solution. Anything that is system-level (i.e., sudo install or equivalent) should be the fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants