Skip to content

Added fbgemm recipe building fbgemm, fbgemm-gpu, and fbgemm-gpu-genai#31820

Open
das-intensity wants to merge 6 commits into
conda-forge:mainfrom
das-intensity:fbgemm
Open

Added fbgemm recipe building fbgemm, fbgemm-gpu, and fbgemm-gpu-genai#31820
das-intensity wants to merge 6 commits into
conda-forge:mainfrom
das-intensity:fbgemm

Conversation

@das-intensity
Copy link
Copy Markdown
Contributor

Checklist

  • Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml".
  • License file is packaged (see here for an example).
  • Source is from official source.
  • Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged).
  • If static libraries are linked in, the license of the static library is packaged.
  • Package does not ship static libraries. If static libraries are needed, follow CFEP-18.
  • Build number is 0.
  • A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details).
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there.
  • When in trouble, please check our knowledge base documentation before pinging a team.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 1, 2026

Hi! This is the staged-recipes linter and your PR looks excellent but I have some suggestions.

File-specific lints and/or hints:

  • recipes/fbgemm/meta.yaml:
    • hints:
      • It looks like you are submitting a multi-output recipe. In these cases, the correct name for the feedstock is ambiguous, and our infrastructure defaults to the top-level package.name field. Please add a feedstock-name entry in the extra section.

@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Jan 1, 2026

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/fbgemm/meta.yaml, recipes/asmjit/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipes/fbgemm/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

For recipes/asmjit/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/20698834647. Examine the logs at this URL for more detail.

@das-intensity das-intensity mentioned this pull request Jan 2, 2026
10 tasks
Copy link
Copy Markdown
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this. This will need some more iteration. Perhaps also consider writing this as a v1 recipe (not required, but should be beneficial, e.g. for things like the git checkout stuff).

Comment thread recipes/fbgemm/meta.yaml Outdated
Comment thread recipes/fbgemm/meta.yaml
number: 0
skip: true # [py<38]
skip: true # [win]
skip: true # [aarch64] # git_url source requires git on build system, problematic for cross-compilation
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if that were the case (which I doubt), we would be able to use $BUILD_PREFIX/bin/git

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So interestingly it uses git to clone, but then says git isn't available. See here: https://gist.github.com/das-intensity/7f5a5bc9d238bbcd63940863a6ab3404

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding

- git
- git-lfs

to the build environment. For whatever reason, the checkout procedure tries to look in there

FileNotFoundError: [Errno 2] No such file or directory: '/home/conda/staged-recipes/build_artifacts/fbgemm_1767418241898/_build_env/bin/git'

git-lfs may not be strictly necessary, but addresses

git: 'lfs' is not a git command. See 'git --help'.

Comment thread recipes/fbgemm/meta.yaml Outdated
- name: fbgemm
build:
script: |
git submodule update --init --recursive
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conda should normally default to recursive checkouts of submodules? Did you test that this is required?

As an aside, why not start out with a v1 recipe right away?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conda should normally default to recursive checkouts

Right you are, I think I just missed removing this output. Will drop.

why not start out with a v1 recipe right away?

I think I read this: https://github.com/conda-forge/staged-recipes/blob/main/recipes/example-v1/README.md?plain=1#L3

but is not yet fully supported by conda-forge's automation.

and figured I didn't know enough conda-forge to know whether what's "not yet fully supported" would come back to bite me (plus I was more familiar with legacy style).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but is not yet fully supported by conda-forge's automation.

That comment is 2 years old. By now things work pretty much without a hitch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #31836

Comment thread recipes/fbgemm/meta.yaml Outdated
Comment thread recipes/fbgemm/meta.yaml
Comment on lines +88 to +103
- python
- pip
- setuptools-git-versioning
- pytorch
- pytorch * *cuda* # [cuda_compiler_version != "None"]
- scikit-build
- tabulate
- jinja2
- pyyaml
- cuda-cudart-dev # [cuda_compiler_version != "None"]
- cuda-nvrtc-dev # [cuda_compiler_version != "None"]
- cuda-nvtx-dev # [cuda_compiler_version != "None"]
- libcublas-dev # [cuda_compiler_version != "None"]
- libcusolver-dev # [cuda_compiler_version != "None"]
- libcusparse-dev # [cuda_compiler_version != "None"]
- libcurand-dev # [cuda_compiler_version != "None"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this should be in host:, not build:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you're probably right about SOME of these, but I switched from:

      build:
        - {{ compiler('c') }}
        - {{ compiler('cxx') }}
        - {{ compiler('cuda') }}  # [cuda_compiler_version != "None"]
        - {{ stdlib('c') }}
        - cmake
        - make
        - ninja
        - git 
        - python
        - pip 
        - setuptools-git-versioning
        - pytorch
        - pytorch * *cuda*  # [cuda_compiler_version != "None"]
        - scikit-build
        - tabulate
        - jinja2
        - pyyaml
        - cuda-cudart-dev  # [cuda_compiler_version != "None"]
        - cuda-nvrtc-dev  # [cuda_compiler_version != "None"]
        - cuda-nvtx-dev  # [cuda_compiler_version != "None"]
        - libcublas-dev  # [cuda_compiler_version != "None"]
        - libcusolver-dev  # [cuda_compiler_version != "None"]
        - libcusparse-dev  # [cuda_compiler_version != "None"]
        - libcurand-dev  # [cuda_compiler_version != "None"]
      host:
        - python
        - pip 
        - setuptools
        - setuptools-git-versioning
        - wheel
        - pytorch
        - scikit-build
        - numpy
        - cuda-version {{ cuda_compiler_version }}  # [cuda_compiler_version != "None"]

to

      build:
        - {{ compiler('c') }}
        - {{ compiler('cxx') }}
        - {{ compiler('cuda') }}  # [cuda_compiler_version != "None"]
        - {{ stdlib('c') }}
        - cmake
        - make
        - ninja
        - git 
      host:
        - python
        - pip 
        - setuptools
        - setuptools-git-versioning
        - wheel
        - pytorch
        - pytorch * *cuda*  # [cuda_compiler_version != "None"]
        - scikit-build
        - numpy
        - tabulate
        - jinja2
        - pyyaml
        - cuda-version {{ cuda_compiler_version }}  # [cuda_compiler_version != "None"]
        - cuda-cudart-dev  # [cuda_compiler_version != "None"]
        - cuda-nvrtc-dev  # [cuda_compiler_version != "None"]
        - cuda-nvtx-dev  # [cuda_compiler_version != "None"]
        - libcublas-dev  # [cuda_compiler_version != "None"]
        - libcusolver-dev  # [cuda_compiler_version != "None"]
        - libcusparse-dev  # [cuda_compiler_version != "None"]
        - libcurand-dev  # [cuda_compiler_version != "None"]

but the cuda build failed with:

[  7%] Building CXX object CMakeFiles/asmjit.dir/home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/work/external/asmjit/src/asmjit/core/builder.cpp.o
/home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/_build_env/bin/x86_64-conda-linux-gnu-c++ -DPROTOBUF_USE_DLLS -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -isystem /home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.11/site-packages/torch/include -isystem /home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include -isystem /home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/_build_env/targets/x86_64-linux/include -DNO_AVX512=1 -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -D_GLIBCXX_USE_CXX11_ABI=1 -MD -MT CMakeFiles/asmjit.dir/home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/work/external/asmjit/src/asmjit/core/builder.cpp.o -MF CMakeFiles/asmjit.dir/home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/work/external/asmjit/src/asmjit/core/builder.cpp.o.d -o CMakeFiles/asmjit.dir/home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/work/external/asmjit/src/asmjit/core/builder.cpp.o -c /home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/work/external/asmjit/src/asmjit/core/builder.cpp
In file included from /home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/ATen/cuda/CUDAContext.h:3,
                 from /home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/work/fbgemm_gpu/src/embedding_inplace_ops/embedding_inplace_update_gpu.cpp:11:
/home/conda/staged-recipes/build_artifacts/fbgemm_1767562352076/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/ATen/cuda/CUDAContextLight.h:7:10: fatal error: cusparse.h: No such file or directory
    7 | #include <cusparse.h>
      |          ^~~~~~~~~~~~
compilation terminated.

Comment thread recipes/fbgemm/meta.yaml
Comment thread recipes/fbgemm/meta.yaml Outdated
Comment thread recipes/fbgemm/meta.yaml Outdated
- test -f $PREFIX/lib/libfbgemm${SHLIB_EXT} # [unix]
- test -f $PREFIX/include/fbgemm/FbgemmBuild.h # [unix]

- name: fbgemm-gpu
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it might be better to use the name fbgemm for this output (though we can keep fbgemm-gpu as an alias wrapper if necessary)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does sound like a logical answer, but fbgemm_gpu is the actual module name.

If we changed fbgemm -> libfbgemm like you suggested, it would free up the package name, but from a user standpoint, the docs clearly show 3 related but distinct pacakges:

  1. FBGEMM
  2. FBGEMM_GPU <--- this output package
  3. FBGEMM_GPU_GENAI

(don't shoot me, I'm just the messenger maintainer)

So given this, calling this fbgemm would be confusing to users.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fbgemm_gpu_cpu is braindead naming for something CPU-only, and the rest is not much better. There's a fundamental difference between fbgemm (what I call libfbgemm), the C++ library, and fbgemm_gpu (what I want to call fbgemm), the python bindings. The upstream naming does not reflect that at all.

So given this, calling this fbgemm would be confusing to users.

I don't think so, or rather, it doesn't matter. If a user installs fbgemm (in my proposed naming), they'd get both the library that the want, as well as the python bindings (that they perhaps don't need). If that user cares about slimming down their environment, they can learn about the libfbgemm naming.

Summing up, this is what I mean. We can't stop upstream from loading a shotgun and blowing their feet off, but we don't have to follow suit on this particular aspect.

thing name upstream name conda-forge
(proposed)
library fbgemm
(not installable via PyPI)
libfbgemm
python bindings
(CUDA)
fbgemm-gpu fbgemm (build string cuda*)
(possible to keep upstream name as compat. wrapper)
python bindings
(CPU)
fbgemm-gpu-cpu fbgemm (build string cpu*)
(possible to keep upstream name as compat. wrapper)
genAI extension fbgemm-gpu-genai fbgemm-genai
(depending on fbgemm)

CC @conda-forge/pytorch-cpu for viz.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the lack of any fbgemm-gpu will ensure that users wanting that "perhaps look harder", and then perhaps we can start the description field with:

Upstream name FBGEMM_GPU

so it shows up on anaconda search pages e.g. https://anaconda.org/search?q=pytorch

Will make the change!

Comment thread recipes/fbgemm/meta.yaml Outdated
Comment on lines +126 to +131
- name: fbgemm-gpu-genai
build:
skip: true # [cuda_compiler_version == "None"]
script: |
cd fbgemm_gpu
python setup.py --package_variant=genai --package_channel=release install --prefix=$PREFIX --single-version-externally-managed --record=record.txt
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks problematic to me; how does the genai variant interact with the cpu/cuda variants? I would have expected this output to depend on the the previous one. Otherwise, we would have to implement pretty complex mutex rules, which is definitely not the right approach here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my above comment #31820 (comment), while it uses the same setup.py file it's really a whole different codebase.

I will say there's some overlap, but fbgemm-gpu and fbgemm-gpu-genai both have lots that the other doesn't.

Comment thread recipes/fbgemm/meta.yaml
-DCMAKE_PREFIX_PATH=$PREFIX \
-DCMAKE_BUILD_TYPE=Release \
-DFBGEMM_LIBRARY_TYPE=shared \
-DASMJIT_STATIC=OFF \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we build this on top of #31498 (comment)? You can have several recipes per PR; if fbgemm depends on asmjit, CI here will determine the DAG and build the recipes in the correct order (so that fbgemm can depend on asmjit)

@das-intensity
Copy link
Copy Markdown
Contributor Author

The cuda build timed out after 6hrs. Unfortunately I'm not surprised. It didn't take quite that long on my local machine IIRC, but my local is pretty powerful.

How can I go about debugging the pipeline machines? E.g. how can I know if it's fully pegging the CPU, such that perhaps a cmake/etc flag might help.

@h-vetinari
Copy link
Copy Markdown
Member

h-vetinari commented Jan 3, 2026

How can I go about debugging the pipeline machines? E.g. how can I know if it's fully pegging the CPU, such that perhaps a cmake/etc flag might help.

Most effective is reducing the GPU arches for now (to a single one). We can switch to cirun once the feedstock is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants