Skip to content

Add tegra variant for CUDA 12.9#485

Closed
traversaro wants to merge 14 commits into
conda-forge:mainfrom
traversaro:fix303
Closed

Add tegra variant for CUDA 12.9#485
traversaro wants to merge 14 commits into
conda-forge:mainfrom
traversaro:fix303

Conversation

@traversaro
Copy link
Copy Markdown
Contributor

@traversaro traversaro commented Feb 12, 2026

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@traversaro
Copy link
Copy Markdown
Contributor Author

@conda-forge-admin, please rerender

@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Feb 12, 2026

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • ❌ The recipe is not parsable by any of the known recipe parsers (['conda-forge-tick (the bot)', 'conda-recipe-manager', 'conda-souschef (grayskull)']). Please check the logs for more information and ensure your recipe can be parsed.

For recipe/meta.yaml:

  • ℹ️ The magma output has been superseded by libmagma-devel.
  • ℹ️ The recipe is not parsable by parser conda-forge-tick (the bot). Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.
  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/21941319594. Examine the logs at this URL for more detail.

@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Feb 12, 2026

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The magma output has been superseded by libmagma-devel.
  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/22300308921. Examine the logs at this URL for more detail.

…6.02.12.01.37.56

Other tools:
- conda-build 26.1.0
- rattler-build 0.55.1
- rattler-build-conda-compat 1.4.10
@h-vetinari
Copy link
Copy Markdown
Member

@traversaro, I got CI running here; despite the reduced CUDA arches, I kinda expect this one to time out; I can help you trigger CI on the opengpu server if you wanna switch back.

@traversaro
Copy link
Copy Markdown
Contributor Author

@traversaro, I got CI running here; despite the reduced CUDA arches, I kinda expect this one to time out; I can help you trigger CI on the opengpu server if you wanna switch back.

Thanks! Let's see how the build on azure goes, I had some errors when building locally that I was not sure if they were real or an artifact of my local system, azure reaching ~6h of run would be already a good experiment.

@traversaro
Copy link
Copy Markdown
Contributor Author

traversaro commented Feb 12, 2026

@traversaro, I got CI running here; despite the reduced CUDA arches, I kinda expect this one to time out; I can help you trigger CI on the opengpu server if you wanna switch back.

@h-vetinari I think you were right, apparently the job died after ~2h due to OOM, can you switch it back to cirun? Thanks!

…forge-pinning 2026.02.12.10.32.49

Other tools:
- conda-build 26.1.0
- rattler-build 0.57.0
- rattler-build-conda-compat 1.4.10
@carterbox
Copy link
Copy Markdown
Member

Tegra-related recipe modifications LGTM.

@traversaro
Copy link
Copy Markdown
Contributor Author

I noticed that both this new tegra jobs and also existing cuda + arm builds are running in cirun-openstack-gpu-2xlarge, however as they are cross-compiling and for sure they do not run the tests on gpu, probably we can move them to cirun-openstack-cpu-xlarge? I definitely do not understand the logic behind the dispatching of the cirun jobs between cpu and gpu, so I am not sure if this is possible at all.

@mgorny
Copy link
Copy Markdown
Contributor

mgorny commented Feb 13, 2026

I noticed that both this new tegra jobs and also existing cuda + arm builds are running in cirun-openstack-gpu-2xlarge, however as they are cross-compiling and for sure they do not run the tests on gpu, probably we can move them to cirun-openstack-cpu-xlarge? I definitely do not understand the logic behind the dispatching of the cirun jobs between cpu and gpu, so I am not sure if this is possible at all.

I think you are correct, but I'd wait for @h-vetinari to confirm.

Copy link
Copy Markdown
Contributor

@mgorny mgorny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWICS, after applying this change we no longer build any sbsa variant (as in after removing the testing skip and rerendering). Is this intentional?

Comment thread recipe/build.sh Outdated
@h-vetinari
Copy link
Copy Markdown
Member

I noticed that both this new tegra jobs and also existing cuda + arm builds are running in cirun-openstack-gpu-2xlarge, however as they are cross-compiling and for sure they do not run the tests on gpu, probably we can move them to cirun-openstack-cpu-xlarge?

Not sure from where, but my brain is convinced that I've heard somewhere that the GPUs are actually useable even in emulation (which had surprised me when I heard about it, which is why I remember it). I have no source for this though, and ultimately it could be that I'm misremembering something. In any case I agree that we don't need to run this on GPU agents.

I definitely do not understand the logic behind the dispatching of the cirun jobs between cpu and gpu, so I am not sure if this is possible at all.

It's unnecessarily complicated because the IMO obvious choice (adding the runner type into the zip that determines CPU-vs.-CUDA) was controversial. There's a compromise solution that I haven't gotten around to implementing yet (the current approach of creating a cartesian product of variants and then skipping the unwanted combinations is a hack that I want to get rid of). It can be modified to do what you're suggesting, but it's painful.

@h-vetinari
Copy link
Copy Markdown
Member

BTW, the build seems to fail with a compiler error in GCC:

In file included from $SRC_DIR/build/aten/src/ATen/native/cpu/Unfold2d.cpp.SVE256.cpp:1:
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp: In function 'void at::native::{anonymous}::unfolded2d_acc_kernel(c10::ScalarType, void*, void*, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, bool)':
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: error: unrecognizable insn:
  225 | }
      | ^
(insn 1387 1386 1388 105 (set (reg:VNx16BI 3248)
        (unspec:VNx16BI [
                (reg:VNx16BI 3245)
                (reg:VNx8BI 3247)
                (const_vector:VNx4BI [
                        (const_int 0 [0]) repeated x8
                    ])
            ] UNSPEC_TRN1_CONV)) "$SRC_DIR/torch/headeronly/util/bit_cast.h":40:14 -1
     (nil))
during RTL pass: vregs
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: internal compiler error: in extract_insn, at recog.cc:2812
Please submit a full bug report, with preprocessed source (by using -freport-bug).
See <https://github.com/conda-forge/ctng-compilers-feedstock/issues/new/choose> for instructions.

You may want to try GCC 15 (by overriding the version in the tegra migrator, adding use_local: true, and rerendering again)

@traversaro
Copy link
Copy Markdown
Contributor Author

BTW, the build seems to fail with a compiler error in GCC:

In file included from $SRC_DIR/build/aten/src/ATen/native/cpu/Unfold2d.cpp.SVE256.cpp:1:
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp: In function 'void at::native::{anonymous}::unfolded2d_acc_kernel(c10::ScalarType, void*, void*, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, bool)':
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: error: unrecognizable insn:
  225 | }
      | ^
(insn 1387 1386 1388 105 (set (reg:VNx16BI 3248)
        (unspec:VNx16BI [
                (reg:VNx16BI 3245)
                (reg:VNx8BI 3247)
                (const_vector:VNx4BI [
                        (const_int 0 [0]) repeated x8
                    ])
            ] UNSPEC_TRN1_CONV)) "$SRC_DIR/torch/headeronly/util/bit_cast.h":40:14 -1
     (nil))
during RTL pass: vregs
$SRC_DIR/aten/src/ATen/native/cpu/Unfold2d.cpp:225:1: internal compiler error: in extract_insn, at recog.cc:2812
Please submit a full bug report, with preprocessed source (by using -freport-bug).
See <https://github.com/conda-forge/ctng-compilers-feedstock/issues/new/choose> for instructions.

You may want to try GCC 15 (by overriding the version in the tegra migrator, adding use_local: true, and rerendering again)

The issue is apparently https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121027 and tracked in pytorch issue tracker at pytorch/pytorch#172630 . A workaround was added upstream in pytorch/pytorch#174647, so I think I will try to backport that patch.

@h-vetinari
Copy link
Copy Markdown
Member

Still missing triton

Could not solve for environment specs
The following packages are incompatible
├─ arm-variant =* tegra is requested and can be installed;
├─ pytorch-gpu =2.10.0 cuda129_generic_h038ea1a_202 is not installable because it requires
│  ├─ arm-variant =* tegra, which can be installed;
│  └─ pytorch ==2.10.0 cuda*_generic*202 but there are no viable options
│     ├─ pytorch 2.10.0 would require
│     │  └─ triton ==3.6.0 *, which requires
│     │     └─ arm-variant =* sbsa, which conflicts with any installable versions previously reported;
│     ├─ pytorch 2.10.0 would require
│     │  └─ libtorch ==2.10.0 cuda129_generic_h849e120_202, which conflicts with any installable versions previously reported;
│     └─ pytorch 2.10.0 would require
│        └─ libtorch ==2.10.0 cuda130_generic_h5a90b99_202, which conflicts with any installable versions previously reported;
└─ pytorch =2.10.0 cuda*_generic*202, which cannot be installed (as previously explained).

@traversaro
Copy link
Copy Markdown
Contributor Author

Still missing triton

conda-forge/triton-feedstock#65

@Tobias-Fischer
Copy link
Copy Markdown
Contributor

@conda-forge-admin please restart ci

@h-vetinari
Copy link
Copy Markdown
Member

h-vetinari commented Feb 20, 2026

Please don't try to use the bots on feedstock that need cirun. I could have just restarted the CI manually on the previous commit here; with the bot reopen the server can't pick it up anymore (apparently).

@Tobias-Fischer
Copy link
Copy Markdown
Contributor

Please don't try to use the bots on feedstock that need cirun. I could have just restarted the CI manually on the previous commit here; with the bot reopen the server can't pick it up anymore (apparently).

Ah sorry I didn't realise :(

@h-vetinari
Copy link
Copy Markdown
Member

No worries. It's a bit of a trap unfortunately; the bots work well in general, but we need to keep them away from cirun to avoid DoS possibilities (whether intentional or accidental).

@h-vetinari
Copy link
Copy Markdown
Member

There seems to be some other problem. The server should have capacity, but the jobs aren't starting. I'll keep an eye to see when it recovers, otherwise I'll have to go chase this down with the help of Amit

@h-vetinari
Copy link
Copy Markdown
Member

Hm; seems we get less far now than we did previously

$SRC_DIR/torch/csrc/jit/python/init.cpp: In lambda function:
$SRC_DIR/torch/csrc/jit/python/init.cpp:1833:32: error: inconsistent types 'pybind11::typing::Tuple<pybind11::none, pybind11::none>' and 'pybind11::typing::Tuple<pybind11::cpp_function&, pybind11::list&>' deduced for lambda return type
 1833 |           return py::make_tuple(func, overload_names);
      |                  ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
$SRC_DIR/torch/csrc/jit/python/init.cpp: In lambda function:
$SRC_DIR/torch/csrc/jit/python/init.cpp:1860:32: error: inconsistent types 'pybind11::typing::Tuple<bool, pybind11::object&>' and 'pybind11::typing::Tuple<bool, pybind11::none>' deduced for lambda return type
 1860 |           return py::make_tuple(false, py::none());
      |                  ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~

@AdamDHines
Copy link
Copy Markdown
Member

AdamDHines commented Feb 21, 2026

Hm; seems we get less far now than we did previously

$SRC_DIR/torch/csrc/jit/python/init.cpp: In lambda function:
$SRC_DIR/torch/csrc/jit/python/init.cpp:1833:32: error: inconsistent types 'pybind11::typing::Tuple<pybind11::none, pybind11::none>' and 'pybind11::typing::Tuple<pybind11::cpp_function&, pybind11::list&>' deduced for lambda return type
 1833 |           return py::make_tuple(func, overload_names);
      |                  ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
$SRC_DIR/torch/csrc/jit/python/init.cpp: In lambda function:
$SRC_DIR/torch/csrc/jit/python/init.cpp:1860:32: error: inconsistent types 'pybind11::typing::Tuple<bool, pybind11::object&>' and 'pybind11::typing::Tuple<bool, pybind11::none>' deduced for lambda return type
 1860 |           return py::make_tuple(false, py::none());
      |                  ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~

Possibly related - I built locally and ran into a similar error on building, but when I pinned pybind11 >=2.11<2.13 and pybind-abi <=4 it built without issue and is running locally on our Jetson Orin.

@Tobias-Fischer
Copy link
Copy Markdown
Contributor

pybind11 3.0.1 should do the job, see pytorch/pytorch#175115 where upstream runs into the same problem with 3.0.2

Comment thread recipe/build.sh Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update CUDA_TARGET too, I think.

Copy link
Copy Markdown
Member

@h-vetinari h-vetinari Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate that CI passes without changes for this. I guess we never exercise the CUDA paths in emulation.

PS. Though of course the fact that CI passes otherwise is great! 🥳

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done in b2dd486 . I guess this is useful to avoid have outdated code and for #487, but I am not sure if the sed is doing anything, I can't find any occurrence of CUDA_TARGET in https://github.com/pytorch/pytorch/blob/v2.10.0/torch/_inductor/cpp_builder.py .

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See

+ + [sysconfig.get_config_var('prefix') + '/targets/@CUDA_TARGET@/include']

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in our patches

+ + [sysconfig.get_config_var('prefix') + '/targets/@CUDA_TARGET@/include']

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants