Skip to content

2.9.0 osx testing#446

Closed
mgorny wants to merge 31 commits into
conda-forge:mainfrom
mgorny:2.9.0-osx
Closed

2.9.0 osx testing#446
mgorny wants to merge 31 commits into
conda-forge:mainfrom
mgorny:2.9.0-osx

Conversation

@mgorny
Copy link
Copy Markdown
Contributor

@mgorny mgorny commented Nov 5, 2025

Splitting to run it independently of the Linux/Windows builds.

mgorny and others added 15 commits November 2, 2025 18:06
Update for 2.9.0, and rebase patches.

Signed-off-by: Michał Górny <mgorny@quansight.com>
The tested assertion started failing on AArch64 cross builds, make
it print the actual value to aid debugging (now and in the future).

Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
Our old patch is no longer sufficient for cross-compilation, the new
code checks for PYTORCH_BLAS_USE_CBLAS_DOT envvar though.

Signed-off-by: Michał Górny <mgorny@quansight.com>
It looks like we now need to explicitly pass CUDA target include
and `libcuda.so` stub directories while building inductor.  Introduce
a substitution `@CUDA_TARGET@` value in the patch, and replace it with
appropriate target in `build.sh`.

This leaves the unsubstituted `@CUDA_TARGET@` path on Windows, but that
shouldn't do any harm -- the path will simply not exist.

Signed-off-by: Michał Górny <mgorny@quansight.com>
Inductor now started requiring fmt headers, and since we are moving
includes from site-packages to the top-level include directory, having
PyTorch install fmt headers there would conflict with system fmt
install.  However, PyTorch nowadays uses fmt 12, so let's just use
the system library instead.

This uses a WIP patch submitted upstream along with a quick hack
to make kineto build.

Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
Upstream now does not build fbgemm on Windows by default, and if we try
to force building it, it just fails.

Signed-off-by: Michał Górny <mgorny@quansight.com>
This is required by fmt headers, and upstream is doing it explicitly
in PyTorch's `CMakeLists.txt`.  I suppose all dependent projects will
have to follow suit.

Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
…and conda-forge-pinning 2025.11.05.12.33.05

Other tools:
- conda-build 25.9.0
- rattler-build 0.49.0
- rattler-build-conda-compat 1.4.9
Signed-off-by: Michał Górny <mgorny@quansight.com>
@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Nov 5, 2025

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • ❌ In conda-forge.yml: $.github_actions = {'resize_win_partitions': True, 'self_hosted': True, 'store_build_artifacts': True, 'timeout_minutes': 1200, 'triggers': ['push', 'pull_request']}.

    {'resize_win_partitions': True, 'self_hosted': True, 'store_build_artifacts': True, 'timeout_minutes': 1200, 'triggers': ['push', 'pull_request']} is not valid under any of the given schemas

    Schema
    {
      "anyOf": [
        {
          "$ref": "#/$defs/GithubActionsConfig"
        },
        {
          "type": "null"
        }
      ]
    }

For recipe/meta.yaml:

  • ℹ️ The magma output has been superseded by libmagma-devel.
  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/19195452456. Examine the logs at this URL for more detail.

@mgorny mgorny mentioned this pull request Nov 6, 2025
3 tasks
@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Nov 6, 2025

@h-vetinari, disabling FBGEMM didn't seem to change much. I've now also disabled MKLDNN but TBH I don't think this is the right way forward.

Hopefully conda-forge/.cirun#118 + conda-forge/.cirun#119 will let us switch runners.

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Nov 8, 2025

Now, are the runners busy or is there still something wrong with the labels?

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Nov 8, 2025

(already fixed wrong bugno in the commit message locally)

@h-vetinari
Copy link
Copy Markdown
Member

Now, are the runners busy or is there still something wrong with the labels?

Given that this our first foray into the osx runners, I'm assuming something's not yet configured right. CC @aktech

@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Nov 13, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The magma output has been superseded by libmagma-devel.
  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/19606522025. Examine the logs at this URL for more detail.

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Nov 13, 2025

Okay, that's something new. Either insufficient permissions on the runner, or a bad action script?

@aktech
Copy link
Copy Markdown

aktech commented Nov 13, 2025

Either insufficient permissions on the runner, or a bad action script?

Most likely "insufficient permissions", I'll update the macos image.

Fixes conda-forge#429

Signed-off-by: Michał Górny <mgorny@quansight.com>
…5.11.13.12.38.23

Other tools:
- conda-build 25.9.0
- rattler-build 0.49.0
- rattler-build-conda-compat 1.4.9
Signed-off-by: Michał Górny <mgorny@quansight.com>
…5.11.13.12.38.23

Other tools:
- conda-build 25.9.0
- rattler-build 0.49.0
- rattler-build-conda-compat 1.4.9
Signed-off-by: Michał Górny <mgorny@quansight.com>
Signed-off-by: Michał Górny <mgorny@quansight.com>
…5.11.14.03.49.41

Other tools:
- conda-build 25.9.0
- rattler-build 0.49.0
- rattler-build-conda-compat 1.4.9
@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Nov 14, 2025

@h-vetinari, on a semi-related manner, I suppose we want to start doing "megabuilds" here as well after the switch, right?

@h-vetinari
Copy link
Copy Markdown
Member

Yeah, would be great to use megabuilds for osx too! :)

Signed-off-by: Michał Górny <mgorny@quansight.com>
…5.11.14.11.30.28

Other tools:
- conda-build 25.9.0
- rattler-build 0.49.0
- rattler-build-conda-compat 1.4.9
Signed-off-by: Michał Górny <mgorny@quansight.com>
…5.11.14.11.30.28

Other tools:
- conda-build 25.9.0
- rattler-build 0.49.0
- rattler-build-conda-compat 1.4.9
Signed-off-by: Michał Górny <mgorny@quansight.com>
…5.11.17.12.05.48

Other tools:
- conda-build 25.9.0
- rattler-build 0.49.0
- rattler-build-conda-compat 1.4.9
@aktech
Copy link
Copy Markdown

aktech commented Nov 19, 2025

Most likely "insufficient permissions", I'll update the macos image.

I built another image and also took another look at the error above. The error is valid, you can't create directories there without sudo, the runnerx user already has the passwordless sudo permissions.

This behaviour is in-compliant with the github runner images (macos-latest) as well:

Screenshot 2025-11-19 at 4 33 55 pm

also FYI: I am updating images irrespective with latest xcode (orthogonal to issues in this PR)

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Nov 19, 2025

I'm going to try something cursed here, to at least confirm if lack of sudo is the only problem.

@aktech
Copy link
Copy Markdown

aktech commented Nov 19, 2025

FYI: I am in the process of upgrading macos runners server to a more powerful one, which means the runners will not spin up for today, migration should be complete tomorrow.

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Nov 19, 2025

Okay, thanks for telling me before I started frantically trying to figure out why they aren't starting again xP.

Signed-off-by: Michał Górny <mgorny@quansight.com>
@aktech
Copy link
Copy Markdown

aktech commented Nov 22, 2025

@mgorny @h-vetinari The OSX CI is running now, with the new image. (Also fyi reminder: max concurrency is 2, due to Apple's Virtualisation limit on a single machine)

@h-vetinari
Copy link
Copy Markdown
Member

That's amazing, thanks so much @aktech! The OSX CI didn't only run, it produced the builds and tests passed (except on osx-arm64 where we shouldn't run the tests obviously; possibly test: native_and_emulated isn't correctly skipping the tests for osx builds outside of azure yet). 🥳

I'm surprised though, I had thought that the scaleway machines are osx-arm64 already; clearly they're osx-64... I must have gotten this wrong somewhere, sorry.

In any case, this should unblock us ~completely. @mgorny, I suggest we don't touch the current setup to get 2.9 out the door, and take care of switching to megabuilds on osx afterwards.

@h-vetinari
Copy link
Copy Markdown
Member

h-vetinari commented Nov 22, 2025

I'm surprised though, I had thought that the scaleway machines are osx-arm64 already; clearly they're osx-64... I must have gotten this wrong somewhere, sorry.

Nevermind, I just need to look at the config in this PR:

build_platform:
  osx_64: osx_arm64
provider:
  osx_arm64: github_actions

OTOH, that means that our (now native) osx-arm64 builds aren't passing the tests 😑

@h-vetinari
Copy link
Copy Markdown
Member

OK, with the fixes from #448, the test suite is looking pretty good, except the following two failures:

=================================== FAILURES ===================================
______________ TestModuleMPS.test_forward_nn_Bilinear_mps_float16 ______________
[gw0] darwin -- Python 3.10.19 $PREFIX/bin/python3.10
Unexpected success
____________________________ TestTorch.test_qengine ____________________________
[gw0] darwin -- Python 3.10.19 $PREFIX/bin/python3.10
self = <test_torch.TestTorch testMethod=test_qengine>
    @xfailIfS390X
    def test_qengine(self):
        qengines = torch.backends.quantized.supported_engines
        original_qe = torch.backends.quantized.engine
        for qe in qengines:
            torch.backends.quantized.engine = qe
            assert torch.backends.quantized.engine == qe, 'qengine not set successfully'
>       torch.backends.quantized.engine = original_qe
test/test_torch.py:9488: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <torch.backends.quantized._QEngineProp object at 0x11b3688b0>
obj = <module 'torch.backends.quantized' from '/Users/runnerx/miniforge3/conda-bld/libtorch_1763869149195/_test_env_placehol...cehold_placehold_placehold_placehold_placehold_plac/lib/python3.10/site-packages/torch/backends/quantized/__init__.py'>
val = 'none'
    def __set__(self, obj, val: str) -> None:
>       torch._C._set_qengine(_get_qengine_id(val))
E       RuntimeError: quantized engine NoQEngine is not supported
E       
E       To execute this test, run the following from the base repo dir:
E           python test/test_torch.py TestTorch.test_qengine
E       
E       This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.10/site-packages/torch/backends/quantized/__init__.py:37: RuntimeError

The "unexpected success" is from a highly templated test and AFAICT the failure expectation comes from here.

I'll see if skipping does the job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants