Skip to content

Preserving libtorch_python in package#246

Closed
jeongseok-meta wants to merge 10 commits into
conda-forge:mainfrom
jeongseok-meta:libtorch_python
Closed

Preserving libtorch_python in package#246
jeongseok-meta wants to merge 10 commits into
conda-forge:mainfrom
jeongseok-meta:libtorch_python

Conversation

@jeongseok-meta

@jeongseok-meta jeongseok-meta commented Jul 9, 2024

Copy link
Copy Markdown
Contributor

Our project requires linking to libtorch_python. Currently, the shared library is not included in the official TorchConfig.cmake, so we use find_library() to locate it. However, this method fails to find libtorch_python, although it successfully locates other symlinked libraries in the same directory (e.g., libtorch_cpu or libtorch_global_deps).

+ ls -l /home/conda/feedstock_root/build_artifacts/momentum_1720498881696/_build_env/lib/python3.12/site-packages/torch/lib
total 22012
lrwxrwxrwx  1 conda conda       21 Jul  9 04:23 libc10.so -> ../../../../libc10.so
lrwxrwxrwx  1 conda conda       21 Jul  9 04:23 libshm.so -> ../../../../libshm.so
lrwxrwxrwx  1 conda conda       27 Jul  9 04:23 libtorch_cpu.so -> ../../../../libtorch_cpu.so
lrwxrwxrwx  1 conda conda       35 Jul  9 04:23 libtorch_global_deps.so -> ../../../../libtorch_global_deps.so
-rwxr-xr-x 49 conda conda 22537640 Jun 17 01:29 libtorch_python.so
lrwxrwxrwx  1 conda conda       23 Jul  9 04:23 libtorch.so -> ../../../../libtorch.so

This PR aims to prevent the removal of libtorch_python by this line. Instead, it moves the library to PREFIX/lib and creates a symbolic link in the torch lib folder, similar to other libraries, hoping this change resolves the issue with find_library() not finding the shared library.

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-webservices

Copy link
Copy Markdown
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jeongseok-meta

Copy link
Copy Markdown
Contributor Author

@conda-forge-admin, please rerender

@github-actions

github-actions Bot commented Jul 9, 2024

Copy link
Copy Markdown

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/9851406294.

Comment thread recipe/build.sh
for f in ATen caffe2 tensorpipe torch c10; do
mv torch/include/$f ${PREFIX}/include/$f
done
rm ${PREFIX}/lib/libtorch_python.*

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work AFAIU; libtorch_python.so depends on a specific python version, while the conda package libtorch intentionally does not depend on python at all.

If we want to ship libtorch_python.so, we should most likely package it in the pytorch output (or perhaps create a separate output).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out that libtorch should not depend on Python. Moving it or creating a separate output seems like a viable solution.

I am relatively new to the build script of this feedstock and find it more complex than other packages I've worked with. Could you please provide some guidance on how to implement your suggestion?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please provide some guidance on how to implement your suggestion?

As a first approach, it should work to "install" libtorch_python.so in build_pytorch.sh, by copying from the build cache (that's created & populated during the execution of build.sh) into $PREFIX/lib.

@hmaarrfk

hmaarrfk commented Jul 9, 2024

Copy link
Copy Markdown
Contributor

you are asking for for the python specific library, you should be able to replace torch_python with torch in your setup script.

@h-vetinari

Copy link
Copy Markdown
Member

you are asking for for the python specific library, you should be able to replace torch_python with torch in your setup script.

We genuinely don't seem to package libtorch_python.so, also not in pytorch itself (which is empty).

@hmaarrfk

hmaarrfk commented Jul 9, 2024

Copy link
Copy Markdown
Contributor
~/miniforge3/pkgs
07:25 $ find -name libtorch_python.so
./pytorch-2.3.1-cpu_mkl_py312h3b258cc_100/lib/python3.12/site-packages/torch/lib/libtorch_python.so
./pytorch-2.3.1-cpu_mkl_py310h75865b9_100/lib/python3.10/site-packages/torch/lib/libtorch_python.so
./pytorch-2.3.1-cpu_generic_py310ha4c588e_0/lib/python3.10/site-packages/torch/lib/libtorch_python.so

in a typical workflow, people often are not linking to it, so i guess we left it in lib so it could be python specific.

@hmaarrfk

hmaarrfk commented Jul 9, 2024

Copy link
Copy Markdown
Contributor

And a computer that has cuda120

~/miniforge3/pkgs
$ find -name libtorch_python.so
./pytorch-2.3.1-cuda120_py310h2c91c31_300/lib/python3.10/site-packages/torch/lib/libtorch_python.so

@hmaarrfk

hmaarrfk commented Jul 9, 2024

Copy link
Copy Markdown
Contributor

We genuinely don't seem to package libtorch_python.so, also not in pytorch itself (which is empty).

I guess that stremlit should be taken with a grain of salt, in this case a 30MB package can't be empty ;)

@h-vetinari

Copy link
Copy Markdown
Member

I guess that streamlit should be taken with a grain of salt, in this case a 30MB package can't be empty ;)

mr-bean-realization

@h-vetinari

h-vetinari commented Jul 10, 2024

Copy link
Copy Markdown
Member

OK, my view of the world is restored (thanks @jaimergp 🙏): the files for pytorch now show up in streamlit, and in particular, libtorch_python.so can be found in $SP_DIR/torch/lib.

It would be within the realm of possibility to move them to $PREFIX/lib and then play around with patchelf to correct the path in the binaries. Here's an example where ray does something similar. Not sure if that's worth it though. 🤷

@hmaarrfk

Copy link
Copy Markdown
Contributor

@jeongseok-meta can you comment as to whether or not you are able to link directly to the non-python libraries?

it also seems that we do include the TorchConfig.cmake file

./libtorch-2.3.1-cuda120_h2b0da52_300/share/cmake/Torch/TorchConfig.cmake

are you using conda-forge cmake or an other?

@jeongseok-meta

Copy link
Copy Markdown
Contributor Author

@hmaarrfk Yes, I can link to other libraries by using find_package(Torch CONFIG REQUIRED) and I am using cmake from conda-forge:

https://github.com/facebookincubator/momentum/blob/0d60cde3a5654c7b21599eefd1cace99da4edb45/pymomentum/CMakeLists.txt#L11

However, the problem is that I want to link to libtorch_python, which is not included in the TorchConfig.cmake.

By the way, thank you for the other comments. I will be AFK for the next few days, but I will catch up when I return.

@hmaarrfk

Copy link
Copy Markdown
Contributor

@jeongseok-meta thanks for giving more context.

In my experience there is often little value in linking to the "Python library" when a "c" library exists.
I can't speak to your exact usecase, but my hunch is that linking to python will give little benefit.

Its also unclear what those at pytorch meant to have as public vs private.

I would consider what you need from the python side, and maybe just avoid using it for maintainability.

One thing that would be helpful for us to understand is, does this work from packages from the pytorch channel? While we try to do it the "conda-forge way" we do try to follow upstream "intentions". So if it works with the pytorch + default channel, then it would give us one extra push to try to make it work here (but again, seems like a very niche usecase, so we would need your help to do it well -- some good suggestions came from h-vetinari)

@isuruf

isuruf commented Aug 2, 2024

Copy link
Copy Markdown
Member

@jeongseok-meta, this is an issue with our split build. I think the best solution is to move $PREFIX/lib/python$PY_VER/site-packages/torch/lib/libtorch_python.so to $PREFIX/lib and make a symlink back at site-packages.

@conda-forge-webservices

Copy link
Copy Markdown
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@jeongseok-meta

Copy link
Copy Markdown
Contributor Author

@hmaarrfk Thank you for sharing your thoughts. Specifically, we need to use python_ivalue.h where the definitions are included in libtorch_python.so for Python binding with PyTorch interop. Unfortunately, we don't have a clear path to avoid depending on the API at the moment, and it's also unclear if PyTorch intended for it to be public or private as well.

One thing that would be helpful for us to understand is, does this work from packages from the pytorch channel?

It works with packages from the PyTorch channel, as we can build and import the Python binding linking to libtorch_python.so without errors locally (pixi.toml of our project)

@jeongseok-meta

Copy link
Copy Markdown
Contributor Author

Hi @isuruf, your solution looks promising! However, I noticed some build failures in some jobs. Do you have any ideas on how to address them?

@hmaarrfk

Copy link
Copy Markdown
Contributor

@conda-forge-admin please rerender

@hmaarrfk

Copy link
Copy Markdown
Contributor

i thought i had access to the GPU runners... maybe they are down.

the package fails to build in docker locally for me.

@hmaarrfk

Copy link
Copy Markdown
Contributor

i think perhaps a new flag was added in gcc 12.4 or something.
conda-forge/ctng-compiler-activation-feedstock#121

perhaps we can try to set the upper bound to 12.3???

@h-vetinari

Copy link
Copy Markdown
Member

the package fails to build in docker locally for me.

Can you post an error log?

i think perhaps a new flag was added in gcc 12.4 or something.

The only thing that changed in our packaging was the way how meson sets a release flag, and we specifically didn't apply that change to 12.4, only 13.x and up.

@hmaarrfk

Copy link
Copy Markdown
Contributor

Here is the log, had to pull it out of my tmux
log.txt

@hmaarrfk

hmaarrfk commented Aug 21, 2024

Copy link
Copy Markdown
Contributor

I think some compilation issues might stem from libmagma having been updated to 2.8 from 2.7.2: conda-forge/libmagma-feedstock#18

That said, i'm not too sure.

2.4.0 seems to build fine with libmagma 2.8.0 https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/10477101071/job/29017474679?pr=250#step:3:734

conda-forge/libmagma-feedstock#21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants