Add tegra variant for CUDA 12.9 by traversaro · Pull Request #485 · conda-forge/pytorch-cpu-feedstock

traversaro · 2026-02-12T09:37:09Z

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

traversaro · 2026-02-12T09:37:24Z

@conda-forge-admin, please rerender

conda-forge-admin · 2026-02-12T09:38:38Z

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

❌ The recipe is not parsable by any of the known recipe parsers (['conda-forge-tick (the bot)', 'conda-recipe-manager', 'conda-souschef (grayskull)']). Please check the logs for more information and ensure your recipe can be parsed.

For recipe/meta.yaml:

ℹ️ The magma output has been superseded by libmagma-devel.
ℹ️ The recipe is not parsable by parser conda-forge-tick (the bot). Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.
ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/21941319594. Examine the logs at this URL for more detail.}

…6.02.12.01.37.5

conda-forge-admin · 2026-02-12T10:46:27Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ The magma output has been superseded by libmagma-devel.
ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/22300308921. Examine the logs at this URL for more detail.}

…6.02.12.01.37.56 Other tools: - conda-build 26.1.0 - rattler-build 0.55.1 - rattler-build-conda-compat 1.4.10

h-vetinari · 2026-02-12T12:29:21Z

@traversaro, I got CI running here; despite the reduced CUDA arches, I kinda expect this one to time out; I can help you trigger CI on the opengpu server if you wanna switch back.

traversaro · 2026-02-12T12:30:54Z

@traversaro, I got CI running here; despite the reduced CUDA arches, I kinda expect this one to time out; I can help you trigger CI on the opengpu server if you wanna switch back.

Thanks! Let's see how the build on azure goes, I had some errors when building locally that I was not sure if they were real or an artifact of my local system, azure reaching ~6h of run would be already a good experiment.

traversaro · 2026-02-12T15:07:48Z

@traversaro, I got CI running here; despite the reduced CUDA arches, I kinda expect this one to time out; I can help you trigger CI on the opengpu server if you wanna switch back.

@h-vetinari I think you were right, apparently the job died after ~2h due to OOM, can you switch it back to cirun? Thanks!

…forge-pinning 2026.02.12.10.32.49 Other tools: - conda-build 26.1.0 - rattler-build 0.57.0 - rattler-build-conda-compat 1.4.10

carterbox · 2026-02-13T00:52:11Z

Tegra-related recipe modifications LGTM.

traversaro · 2026-02-13T08:28:14Z

I noticed that both this new tegra jobs and also existing cuda + arm builds are running in cirun-openstack-gpu-2xlarge, however as they are cross-compiling and for sure they do not run the tests on gpu, probably we can move them to cirun-openstack-cpu-xlarge? I definitely do not understand the logic behind the dispatching of the cirun jobs between cpu and gpu, so I am not sure if this is possible at all.

mgorny · 2026-02-13T16:26:30Z

I noticed that both this new tegra jobs and also existing cuda + arm builds are running in cirun-openstack-gpu-2xlarge, however as they are cross-compiling and for sure they do not run the tests on gpu, probably we can move them to cirun-openstack-cpu-xlarge? I definitely do not understand the logic behind the dispatching of the cirun jobs between cpu and gpu, so I am not sure if this is possible at all.

I think you are correct, but I'd wait for @h-vetinari to confirm.

mgorny

FWICS, after applying this change we no longer build any sbsa variant (as in after removing the testing skip and rerendering). Is this intentional?

h-vetinari · 2026-02-14T02:02:20Z

I noticed that both this new tegra jobs and also existing cuda + arm builds are running in cirun-openstack-gpu-2xlarge, however as they are cross-compiling and for sure they do not run the tests on gpu, probably we can move them to cirun-openstack-cpu-xlarge?

Not sure from where, but my brain is convinced that I've heard somewhere that the GPUs are actually useable even in emulation (which had surprised me when I heard about it, which is why I remember it). I have no source for this though, and ultimately it could be that I'm misremembering something. In any case I agree that we don't need to run this on GPU agents.

I definitely do not understand the logic behind the dispatching of the cirun jobs between cpu and gpu, so I am not sure if this is possible at all.

It's unnecessarily complicated because the IMO obvious choice (adding the runner type into the zip that determines CPU-vs.-CUDA) was controversial. There's a compromise solution that I haven't gotten around to implementing yet (the current approach of creating a cartesian product of variants and then skipping the unwanted combinations is a hack that I want to get rid of). It can be modified to do what you're suggesting, but it's painful.

h-vetinari · 2026-02-14T02:04:09Z

BTW, the build seems to fail with a compiler error in GCC:

In file included from $SRC_DIR/build/aten/src/ATen/native/cpu/Unfold2d.cpp.SVE256.cpp:1:
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp: In function 'void at::native::{anonymous}::unfolded2d_acc_kernel(c10::ScalarType, void*, void*, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, bool)':
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: error: unrecognizable insn:
  225 | }
      | ^
(insn 1387 1386 1388 105 (set (reg:VNx16BI 3248)
        (unspec:VNx16BI [
                (reg:VNx16BI 3245)
                (reg:VNx8BI 3247)
                (const_vector:VNx4BI [
                        (const_int 0 [0]) repeated x8
                    ])
            ] UNSPEC_TRN1_CONV)) "$SRC_DIR/torch/headeronly/util/bit_cast.h":40:14 -1
     (nil))
during RTL pass: vregs
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: internal compiler error: in extract_insn, at recog.cc:2812
Please submit a full bug report, with preprocessed source (by using -freport-bug).
See <https://github.com/conda-forge/ctng-compilers-feedstock/issues/new/choose> for instructions.

You may want to try GCC 15 (by overriding the version in the tegra migrator, adding use_local: true, and rerendering again)

traversaro · 2026-02-14T13:52:52Z

BTW, the build seems to fail with a compiler error in GCC:

In file included from $SRC_DIR/build/aten/src/ATen/native/cpu/Unfold2d.cpp.SVE256.cpp:1:
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp: In function 'void at::native::{anonymous}::unfolded2d_acc_kernel(c10::ScalarType, void*, void*, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, bool)':
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: error: unrecognizable insn:
  225 | }
      | ^
(insn 1387 1386 1388 105 (set (reg:VNx16BI 3248)
        (unspec:VNx16BI [
                (reg:VNx16BI 3245)
                (reg:VNx8BI 3247)
                (const_vector:VNx4BI [
                        (const_int 0 [0]) repeated x8
                    ])
            ] UNSPEC_TRN1_CONV)) "$SRC_DIR/torch/headeronly/util/bit_cast.h":40:14 -1
     (nil))
during RTL pass: vregs
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: internal compiler error: in extract_insn, at recog.cc:2812
Please submit a full bug report, with preprocessed source (by using -freport-bug).
See <https://github.com/conda-forge/ctng-compilers-feedstock/issues/new/choose> for instructions.

You may want to try GCC 15 (by overriding the version in the tegra migrator, adding use_local: true, and rerendering again)

The issue is apparently https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121027 and tracked in pytorch issue tracker at pytorch/pytorch#172630 . A workaround was added upstream in pytorch/pytorch#174647, so I think I will try to backport that patch.

Co-authored-by: Michał Górny <mgorny@gentoo.org>

h-vetinari · 2026-02-16T03:34:04Z

Still missing triton

Could not solve for environment specs
The following packages are incompatible
├─ arm-variant =* tegra is requested and can be installed;
├─ pytorch-gpu =2.10.0 cuda129_generic_h038ea1a_202 is not installable because it requires
│  ├─ arm-variant =* tegra, which can be installed;
│  └─ pytorch ==2.10.0 cuda*_generic*202 but there are no viable options
│     ├─ pytorch 2.10.0 would require
│     │  └─ triton ==3.6.0 *, which requires
│     │     └─ arm-variant =* sbsa, which conflicts with any installable versions previously reported;
│     ├─ pytorch 2.10.0 would require
│     │  └─ libtorch ==2.10.0 cuda129_generic_h849e120_202, which conflicts with any installable versions previously reported;
│     └─ pytorch 2.10.0 would require
│        └─ libtorch ==2.10.0 cuda130_generic_h5a90b99_202, which conflicts with any installable versions previously reported;
└─ pytorch =2.10.0 cuda*_generic*202, which cannot be installed (as previously explained).

traversaro · 2026-02-16T18:46:22Z

Still missing triton

conda-forge/triton-feedstock#65

Tobias-Fischer · 2026-02-20T03:05:12Z

@conda-forge-admin please restart ci

h-vetinari · 2026-02-20T04:12:57Z

Please don't try to use the bots on feedstock that need cirun. I could have just restarted the CI manually on the previous commit here; with the bot reopen the server can't pick it up anymore (apparently).

Tobias-Fischer · 2026-02-20T04:14:13Z

Please don't try to use the bots on feedstock that need cirun. I could have just restarted the CI manually on the previous commit here; with the bot reopen the server can't pick it up anymore (apparently).

Ah sorry I didn't realise :(

h-vetinari · 2026-02-20T04:16:01Z

No worries. It's a bit of a trap unfortunately; the bots work well in general, but we need to keep them away from cirun to avoid DoS possibilities (whether intentional or accidental).

h-vetinari · 2026-02-20T04:34:20Z

There seems to be some other problem. The server should have capacity, but the jobs aren't starting. I'll keep an eye to see when it recovers, otherwise I'll have to go chase this down with the help of Amit

h-vetinari · 2026-02-21T02:32:23Z

Hm; seems we get less far now than we did previously

$SRC_DIR/torch/csrc/jit/python/init.cpp: In lambda function:
$SRC_DIR/torch/csrc/jit/python/init.cpp:1833:32: error: inconsistent types 'pybind11::typing::Tuple<pybind11::none, pybind11::none>' and 'pybind11::typing::Tuple<pybind11::cpp_function&, pybind11::list&>' deduced for lambda return type
 1833 |           return py::make_tuple(func, overload_names);
      |                  ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
$SRC_DIR/torch/csrc/jit/python/init.cpp: In lambda function:
$SRC_DIR/torch/csrc/jit/python/init.cpp:1860:32: error: inconsistent types 'pybind11::typing::Tuple<bool, pybind11::object&>' and 'pybind11::typing::Tuple<bool, pybind11::none>' deduced for lambda return type
 1860 |           return py::make_tuple(false, py::none());
      |                  ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~

AdamDHines · 2026-02-21T02:36:08Z

Hm; seems we get less far now than we did previously

$SRC_DIR/torch/csrc/jit/python/init.cpp: In lambda function:
$SRC_DIR/torch/csrc/jit/python/init.cpp:1833:32: error: inconsistent types 'pybind11::typing::Tuple<pybind11::none, pybind11::none>' and 'pybind11::typing::Tuple<pybind11::cpp_function&, pybind11::list&>' deduced for lambda return type
 1833 |           return py::make_tuple(func, overload_names);
      |                  ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
$SRC_DIR/torch/csrc/jit/python/init.cpp: In lambda function:
$SRC_DIR/torch/csrc/jit/python/init.cpp:1860:32: error: inconsistent types 'pybind11::typing::Tuple<bool, pybind11::object&>' and 'pybind11::typing::Tuple<bool, pybind11::none>' deduced for lambda return type
 1860 |           return py::make_tuple(false, py::none());
      |                  ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~

Possibly related - I built locally and ran into a similar error on building, but when I pinned pybind11 >=2.11<2.13 and pybind-abi <=4 it built without issue and is running locally on our Jetson Orin.

Tobias-Fischer · 2026-02-21T03:11:23Z

pybind11 3.0.1 should do the job, see pytorch/pytorch#175115 where upstream runs into the same problem with 3.0.2

pytorch itself already depends on pybind11 at runtime

mgorny · 2026-02-21T09:28:46Z

You need to update CUDA_TARGET too, I think.

It's a bit unfortunate that CI passes without changes for this. I guess we never exercise the CUDA paths in emulation.

PS. Though of course the fact that CI passes otherwise is great! 🥳

Thanks, done in b2dd486 . I guess this is useful to avoid have outdated code and for #487, but I am not sure if the sed is doing anything, I can't find any occurrence of CUDA_TARGET in https://github.com/pytorch/pytorch/blob/v2.10.0/torch/_inductor/cpp_builder.py .

See

pytorch-cpu-feedstock/recipe/patches/0006-Add-conda-prefix-to-inductor-include-lib-paths.patch

Line 29 in 4070d99

+ + [sysconfig.get_config_var('prefix') + '/targets/@CUDA_TARGET@/include']

It's used in our patches

pytorch-cpu-feedstock/recipe/patches/0006-Add-conda-prefix-to-inductor-include-lib-paths.patch

Line 29 in 4070d99

+ + [sysconfig.get_config_var('prefix') + '/targets/@CUDA_TARGET@/include']

Ahh, thanks!

traversaro added 2 commits February 12, 2026 10:35

Add arm_variant_type.yaml migration

70eee41

Skip non-tegra build for debug

5f1628b

traversaro requested review from Tobias-Fischer, baszalmstra, beckermr, benjaminrwilson, h-vetinari, hmaarrfk, jeongseok-meta, mgorny and sodre as code owners February 12, 2026 09:37

traversaro mentioned this pull request Feb 12, 2026

Add support for Jetson Orin embedded cards (i.e. add "8.7" to TORCH_CUDA_ARCH_LIST in linux-aarch64) ? #303

Closed

1 task

MNT: Re-rendered with conda-smithy 3.54.2 and conda-forge-pinning 202…

cacbd8e

…6.02.12.01.37.5

h-vetinari added 2 commits February 12, 2026 21:50

debug on azure

2c8e0eb

MNT: Re-rendered with conda-smithy 3.54.2 and conda-forge-pinning 202…

4648c9e

…6.02.12.01.37.56 Other tools: - conda-build 26.1.0 - rattler-build 0.55.1 - rattler-build-conda-compat 1.4.10

h-vetinari force-pushed the fix303 branch from 90355ab to 4648c9e Compare February 12, 2026 10:51

h-vetinari added 2 commits February 13, 2026 09:18

back to cirun

73c7cf9

MNT: Re-rendered with conda-smithy 3.54.3.dev21+ga2f045da7 and conda-…

7ac7dd3

…forge-pinning 2026.02.12.10.32.49 Other tools: - conda-build 26.1.0 - rattler-build 0.57.0 - rattler-build-conda-compat 1.4.10

mgorny reviewed Feb 13, 2026

View reviewed changes

Comment thread recipe/build.sh Outdated

traversaro and others added 3 commits February 14, 2026 15:03

Backport patch to avoid ICE on GCC 14 on arm

152c852

Simplify tegra if clause

3ad1c6d

Co-authored-by: Michał Górny <mgorny@gentoo.org>

trigger CI

81cc8f5

h-vetinari mentioned this pull request Feb 17, 2026

Windows compilation issues with std:: and compiled_autograd.h #459

Closed

Tobias-Fischer mentioned this pull request Feb 18, 2026

pytorch 2.10 fails CUDA detection for cpp extensions #478

Closed

h-vetinari mentioned this pull request Feb 18, 2026

Find target-specific CUDA paths by default #487

Closed

conda-forge-webservices Bot closed this Feb 20, 2026

conda-forge-webservices Bot reopened this Feb 20, 2026

trigger CI

e311f43

h-vetinari added 2 commits February 21, 2026 15:11

remove superfluous test dependency of pytorch-tests

900769d

pytorch itself already depends on pybind11 at runtime

avoid pybind11 v3.0.2

cccfda0

mgorny requested changes Feb 21, 2026

View reviewed changes

Set CUDA_TARGET appropriately for tegra builds

b2dd486

h-vetinari mentioned this pull request Feb 24, 2026

Tegra, deactivation, CUDA, autograd #491

Merged

h-vetinari closed this in #491 Feb 27, 2026

Uh oh!

Conversation

traversaro commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

traversaro commented Feb 12, 2026

Uh oh!

conda-forge-admin commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

conda-forge-admin commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-vetinari commented Feb 12, 2026

Uh oh!

traversaro commented Feb 12, 2026

Uh oh!

traversaro commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carterbox commented Feb 13, 2026

Uh oh!

traversaro commented Feb 13, 2026

Uh oh!

mgorny commented Feb 13, 2026

Uh oh!

mgorny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

h-vetinari commented Feb 14, 2026

Uh oh!

h-vetinari commented Feb 14, 2026

Uh oh!

traversaro commented Feb 14, 2026

Uh oh!

h-vetinari commented Feb 16, 2026

Uh oh!

traversaro commented Feb 16, 2026

Uh oh!

Tobias-Fischer commented Feb 20, 2026

Uh oh!

h-vetinari commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tobias-Fischer commented Feb 20, 2026

Uh oh!

h-vetinari commented Feb 20, 2026

Uh oh!

h-vetinari commented Feb 20, 2026

Uh oh!

h-vetinari commented Feb 21, 2026

Uh oh!

AdamDHines commented Feb 21, 2026 • edited by h-vetinari Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tobias-Fischer commented Feb 21, 2026

Uh oh!

mgorny Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

h-vetinari Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

traversaro Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Tobias-Fischer Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

h-vetinari Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

traversaro Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

traversaro commented Feb 12, 2026 •

edited

Loading

conda-forge-admin commented Feb 12, 2026 •

edited

Loading

conda-forge-admin commented Feb 12, 2026 •

edited

Loading

traversaro commented Feb 12, 2026 •

edited

Loading

h-vetinari commented Feb 20, 2026 •

edited

Loading

AdamDHines commented Feb 21, 2026 •

edited by h-vetinari

Loading

h-vetinari Feb 22, 2026 •

edited

Loading