Skip to content

Combined updates for 2.10.x#475

Merged
h-vetinari merged 13 commits into
conda-forge:mainfrom
mgorny:rebuild-fmt121_spdlog117-0-1_h983fc7
Jan 24, 2026
Merged

Combined updates for 2.10.x#475
h-vetinari merged 13 commits into
conda-forge:mainfrom
mgorny:rebuild-fmt121_spdlog117-0-1_h983fc7

Conversation

@mgorny
Copy link
Copy Markdown
Contributor

@mgorny mgorny commented Jan 10, 2026

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Combined updates for 2.10.x. So far with [ci skip] on top, we'll run it when the next RC or final is available.

Signed-off-by: Michał Górny <mgorny@quansight.com>
@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Jan 14, 2026

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The magma output has been superseded by libmagma-devel.
  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/21220733650. Examine the logs at this URL for more detail.

Signed-off-by: Michał Górny <mgorny@quansight.com>
Comment thread recipe/meta.yaml Outdated
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 17, 2026

@h-vetinari, do we want to include CUDA 13 migration for when the final is released?

@h-vetinari
Copy link
Copy Markdown
Member

@h-vetinari, do we want to include CUDA 13 migration for when the final is released?

As long as you use a development install of smithy (combined with the skip from #332, so that CPU builds run on non-GPU agents), that's OK for me. We also won't be able to test the GPU paths for CUDA 13, but I guess running the test suite on CUDA 12.x only is good enough.

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 19, 2026

@h-vetinari, do we want to include CUDA 13 migration for when the final is released?

As long as you use a development install of smithy (combined with the skip from #332, so that CPU builds run on non-GPU agents), that's OK for me. We also won't be able to test the GPU paths for CUDA 13, but I guess running the test suite on CUDA 12.x only is good enough.

I suppose new conda-smithy will be released before the final version.

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 19, 2026

The aarch64 build hit the "exec format error" problem again, and the mkl/CUDA x86-64 build seems to have hit some builder issue — the logs are cut short, and GitHub seemed to be confused over whether it actually failed or was still running.

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 20, 2026

Uh, aarch64 keeps failing with that "Exec format error". I wonder how it is that it happens only in some runs.

And then mkl run timed out. FWICS the non-mkl build took over 19 hours anyway, so we probably need to consider increasing timeouts again.

Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
…6.01.20.09.33.33

Other tools:
- conda-build 25.11.1
- rattler-build 0.55.0
- rattler-build-conda-compat 1.4.10
@h-vetinari
Copy link
Copy Markdown
Member

One test failure on osx-arm64; I can add a skip for that while merging (assuming the rest doesn't blow up)

=========================== short test summary info ============================
FAILED [0.3161s] test/test_nn.py::TestNNDeviceTypeMPS::test_LayerNorm_numeric_mps - AssertionError: Tensor-likes are not close!

Mismatched elements: 9437063 / 18874368 (50.0%)
Greatest absolute difference: 0.7128685712814331 at index (0, 8, 24, 54) (up to 1e-05 allowed)
Greatest relative difference: 0.7136273980140686 at index (0, 69, 57, 29) (up to 0 allowed)

To execute this test, run the following from the base repo dir:
    python test/test_nn.py TestNNDeviceTypeMPS.test_LayerNorm_numeric_mps

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
= 1 failed, 8672 passed, 4907 skipped, 57 xfailed, 65984 warnings in 462.48s (0:07:42) =

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 22, 2026

Uh, osx-64 looks like it failed with some cmake package weirdness (temporary bug? flakiness?).

For linux-64 failures I can't seem to be able to get proper logs from GitHub. Raw logs don't work, and they're not in the "log archive". I suspect some GitHub internal issue broke these builds.

@h-vetinari
Copy link
Copy Markdown
Member

Sigh, some BS flake in the other linux-64 + CUDA job as well

Solving environment (_h_env): ...working... failed
WARNING: failed to get package records, retrying.  exception was: Unsatisfiable dependencies for platform linux-64: {MatchSpec("packaging=26.0[build=pyhcf101f3_0]")}
Encountered problems while solving:
  - unsupported request
  - unsupported request
  - unsupported request

Could not solve for environment specs
The following packages are incompatible
├─ packaging =26.0 pyhcf101f3_0 does not exist (perhaps a typo or a missing channel);
├─ setuptools =80.10.1 pyh332efcf_0 does not exist (perhaps a typo or a missing channel);
└─ wheel =0.46.2 pyhd8ed1ab_0 does not exist (perhaps a typo or a missing channel).

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 23, 2026

Note to self: enable py3.14 testing on triton-feedstock once we have PyTorch py3.14 packages.

@RoyiAvital
Copy link
Copy Markdown

That was fast (Making all pass)!

@h-vetinari
Copy link
Copy Markdown
Member

h-vetinari commented Jan 24, 2026

I wouldn't call that fast... We've been running CI for the last ~60h. 😅

Given that, @mgorny and I decided to merge this as-is. Other updates (python 3.14, triton bump, CUDA 13, maybe fixing #459) can come in the next update.

Copy link
Copy Markdown
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @mgorny!

@h-vetinari h-vetinari merged commit b78513c into conda-forge:main Jan 24, 2026
33 of 46 checks passed
@RoyiAvital
Copy link
Copy Markdown

It was fast in the meaning being able to serve users once the official version is released.
It means all the optimizations done by you in previous versions paid up and now it seems the process is "easier".

Anyhow, appreciate all the knowledge and effort by you @mgorny , @h-vetinari.
Your efforts generate the simplest and most robust way to use PyTorch with Python.

@h-vetinari
Copy link
Copy Markdown
Member

Thanks for the kind words!

It means all the optimizations done by you in previous versions paid up and now it seems the process is "easier".

Yeah, I've been saying (hoping 😅) that I think the worst of the delays should hopefully be behind us, now that we've switched all platforms to run on special-purpose CI here. Still not exactly easy when a full build for a single commit can take 2-3 days, but hopefully we'll be able to keep things running smoothly!

@RoyiAvital
Copy link
Copy Markdown

While all binaries for other system are up, I don't see version 2.10 with CUDA for Windows 64:

image

There is only the CPU version.

@h-vetinari
Copy link
Copy Markdown
Member

h-vetinari commented Jan 25, 2026

This is due to conda/infrastructure#1159 and needs manual work-arounds. Please be patient

@h-vetinari
Copy link
Copy Markdown
Member

$ gh run download 21312278993 --repo conda-forge/pytorch-cpu-feedstock --name conda_artifacts_21312278993_win_64_channel_targetsconda-forge_maincu_hca575dce
$ unzip pytorch-cpu-feedstock_conda_artifacts_.zip
$ cd bld/win-64 && rm current_repodata.json index.html repodata*
$ ls
libtorch-2.10.0-cuda128_mkl_h5af97b6_300.conda       pytorch-gpu-2.10.0-cuda128_mkl_hc88b545_300.conda
pytorch-2.10.0-cuda128_mkl_py310_h1968e09_300.conda  pytorch-tests-2.10.0-cuda128_mkl_py310_hf0eca92_300.conda
pytorch-2.10.0-cuda128_mkl_py311_h7fbd949_300.conda  pytorch-tests-2.10.0-cuda128_mkl_py311_hc85c64c_300.conda
pytorch-2.10.0-cuda128_mkl_py312_h5b42cb5_300.conda  pytorch-tests-2.10.0-cuda128_mkl_py312_hb3d0777_300.conda
pytorch-2.10.0-cuda128_mkl_py313_ha1b8ff3_300.conda  pytorch-tests-2.10.0-cuda128_mkl_py313_hd85d54a_300.conda
$ ls | xargs anaconda upload
$ DELEGATE=h-vetinari
PACKAGE_VERSION=2.10.0
for package in libtorch pytorch pytorch-gpu pytorch-tests; do
  anaconda copy --from-label main --to-label main --to-owner conda-forge ${DELEGATE}/${package}/${PACKAGE_VERSION}
done

@RoyiAvital
Copy link
Copy Markdown

I have all patience. Just reported what I saw.

@h-vetinari
Copy link
Copy Markdown
Member

Well... It's not adding much useful information; you could check first that the CI run failed to upload the packages, or how this was done in any other PR that was merged the last few months, then you'd have very likely noticed what's going on.

@h-vetinari
Copy link
Copy Markdown
Member

And then mkl run timed out.

CUDA+MKL took 21h 18min after merging this PR... Pretty insane, but it only seems to be going up. 😑

@RoyiAvital
Copy link
Copy Markdown

You're assuming knowledge some (Most??) of users does not have. At least not me.
Hence I thought it is worth reporting.

@h-vetinari
Copy link
Copy Markdown
Member

You're assuming knowledge some (Most??) of users does not have. At least not me.
Hence I thought it is worth reporting.

I'm not assuming that you know. I'm asking that you look at obvious things (like the CI status on main, including clicking through to the logs to see what actually failed; ideally also doing some minimal research on the failure you've discovered using for example the search function on the feedstock) before reporting things that are completely trivial.

@h-vetinari
Copy link
Copy Markdown
Member

h-vetinari commented Jan 25, 2026

Uh, aarch64 keeps failing with that "Exec format error". I wonder how it is that it happens only in some runs.

aarch+CUDA failed again. This seems to be a recurring flake as well

=========================== short test summary info ============================
FAILED [0.6125s] test/test_torch.py::TestTorch::test_terminate_handler_on_crash - OSError: [Errno 8] Exec format error: '$PREFIX/bin/python3.12'
= 1 failed, 7584 passed, 1487 skipped, 31 xfailed, 51965 warnings in 4159.72s (1:09:19) =

I'll push another skip to main

@h-vetinari
Copy link
Copy Markdown
Member

OK, all the builds are up now, I'll start the migration

Other updates (python 3.14, triton bump, CUDA 13, maybe fixing #459) can come in the next update.

@mgorny, are you planning to tackle this? I tried debugging #459 a bit, but cannot understand how std::string seems to have two conflicting stdlib implementations in play (presumably once STL and once something CUDA-related and/or with some flags that change the ABI; there's a PR on the pytorch_scatter feedstock linked in that issue that can be used for debugging).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants