Add tegra variant for CUDA 12.9#485
Conversation
|
@conda-forge-admin, please rerender |
|
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipe/meta.yaml:
For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/21941319594. Examine the logs at this URL for more detail. |
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/22300308921. Examine the logs at this URL for more detail. |
…6.02.12.01.37.56 Other tools: - conda-build 26.1.0 - rattler-build 0.55.1 - rattler-build-conda-compat 1.4.10
|
@traversaro, I got CI running here; despite the reduced CUDA arches, I kinda expect this one to time out; I can help you trigger CI on the opengpu server if you wanna switch back. |
Thanks! Let's see how the build on azure goes, I had some errors when building locally that I was not sure if they were real or an artifact of my local system, azure reaching ~6h of run would be already a good experiment. |
@h-vetinari I think you were right, apparently the job died after ~2h due to OOM, can you switch it back to cirun? Thanks! |
…forge-pinning 2026.02.12.10.32.49 Other tools: - conda-build 26.1.0 - rattler-build 0.57.0 - rattler-build-conda-compat 1.4.10
|
Tegra-related recipe modifications LGTM. |
|
I noticed that both this new tegra jobs and also existing cuda + arm builds are running in |
I think you are correct, but I'd wait for @h-vetinari to confirm. |
mgorny
left a comment
There was a problem hiding this comment.
FWICS, after applying this change we no longer build any sbsa variant (as in after removing the testing skip and rerendering). Is this intentional?
Not sure from where, but my brain is convinced that I've heard somewhere that the GPUs are actually useable even in emulation (which had surprised me when I heard about it, which is why I remember it). I have no source for this though, and ultimately it could be that I'm misremembering something. In any case I agree that we don't need to run this on GPU agents.
It's unnecessarily complicated because the IMO obvious choice (adding the runner type into the zip that determines CPU-vs.-CUDA) was controversial. There's a compromise solution that I haven't gotten around to implementing yet (the current approach of creating a cartesian product of variants and then skipping the unwanted combinations is a hack that I want to get rid of). It can be modified to do what you're suggesting, but it's painful. |
|
BTW, the build seems to fail with a compiler error in GCC: You may want to try GCC 15 (by overriding the version in the tegra migrator, adding |
The issue is apparently https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121027 and tracked in pytorch issue tracker at pytorch/pytorch#172630 . A workaround was added upstream in pytorch/pytorch#174647, so I think I will try to backport that patch. |
Co-authored-by: Michał Górny <mgorny@gentoo.org>
|
Still missing triton |
|
|
@conda-forge-admin please restart ci |
|
Please don't try to use the bots on feedstock that need cirun. I could have just restarted the CI manually on the previous commit here; with the bot reopen the server can't pick it up anymore (apparently). |
Ah sorry I didn't realise :( |
|
No worries. It's a bit of a trap unfortunately; the bots work well in general, but we need to keep them away from cirun to avoid DoS possibilities (whether intentional or accidental). |
|
There seems to be some other problem. The server should have capacity, but the jobs aren't starting. I'll keep an eye to see when it recovers, otherwise I'll have to go chase this down with the help of Amit |
|
Hm; seems we get less far now than we did previously |
Possibly related - I built locally and ran into a similar error on building, but when I pinned |
|
pybind11 3.0.1 should do the job, see pytorch/pytorch#175115 where upstream runs into the same problem with 3.0.2 |
pytorch itself already depends on pybind11 at runtime
There was a problem hiding this comment.
You need to update CUDA_TARGET too, I think.
There was a problem hiding this comment.
It's a bit unfortunate that CI passes without changes for this. I guess we never exercise the CUDA paths in emulation.
PS. Though of course the fact that CI passes otherwise is great! 🥳
There was a problem hiding this comment.
Thanks, done in b2dd486 . I guess this is useful to avoid have outdated code and for #487, but I am not sure if the sed is doing anything, I can't find any occurrence of CUDA_TARGET in https://github.com/pytorch/pytorch/blob/v2.10.0/torch/_inductor/cpp_builder.py .
There was a problem hiding this comment.
There was a problem hiding this comment.
It's used in our patches
Checklist
0(if the version changed)conda-smithy(Use the phrase@conda-forge-admin, please rerenderin a comment in this PR for automated rerendering)