Skip to content

{ai}[foss/2025a] PyTorch v2.9.1, Triton v3.5.1 w/ CUDA 12.8.0 from wheels#25267

Open
lexming wants to merge 1 commit intoeasybuilders:developfrom
lexming:20260212232309_new_pr_PyTorch291
Open

{ai}[foss/2025a] PyTorch v2.9.1, Triton v3.5.1 w/ CUDA 12.8.0 from wheels#25267
lexming wants to merge 1 commit intoeasybuilders:developfrom
lexming:20260212232309_new_pr_PyTorch291

Conversation

@lexming
Copy link
Copy Markdown
Contributor

@lexming lexming commented Feb 12, 2026

(created using eb --new-pr)

Depends on:

This is a binary installation of the official wheels from pytorch.org.

I did some benchmarking between this type of installation with wheels and the usual build from source in EB and the performance is practically the same for inference/training jobs on GPU. I put more detailed information in easybuilders/easybuild#931

…ss-2025a-CUDA-12.8.0-whl.eb, Triton-3.5.1-gfbf-2025a-CUDA-12.8.0-whl.eb
@lexming lexming added the update label Feb 12, 2026
@github-actions github-actions bot added the 2025a issues & PRs related to 2025a common toolchains label Feb 12, 2026
@github-actions
Copy link
Copy Markdown

Diff of new easyconfig(s) against existing ones is too long for a GitHub comment. Use --review-pr (and --review-pr-filter / --review-pr-max) locally.

@verdurin
Copy link
Copy Markdown
Member

@lexming been trying this and I see an error if I use it in a venv:

[crm194@compgh001 ~]$ ml PyTorch/2.9.1-foss-2025a-CUDA-12.8.0-whl
[crm194@compgh001 ~]$ which python
/apps/eb/el9/2025a/aarch64/software/Python/3.13.1-GCCcore-14.2.0/bin/python
[crm194@compgh001 ~]$ python -m venv ~/venvs/gh200-$(date +%F)
[crm194@compgh001 ~]$ source ~/venvs/gh200-2026-03-13/bin/activate
(gh200-2026-03-13) [crm194@compgh001 ~]$ which python
~/venvs/gh200-2026-03-13/bin/python
(gh200-2026-03-13) [crm194@compgh001 ~]$ python
Python 3.13.1 (main, Mar  3 2026, 12:18:06) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    import torch
  File "/apps/eb/el9/2025a/aarch64/software/PyTorch/2.9.1-foss-2025a-CUDA-12.8.0-whl/lib/python3.13/site-packages/torch/__init__.py", line 35, in <module>
    from typing_extensions import ParamSpec as _ParamSpec, TypeIs as _TypeIs
ModuleNotFoundError: No module named 'typing_extensions'
>>>

@verdurin
Copy link
Copy Markdown
Member

Loading setuptools-rust fixes the error in the venv.

@lexming
Copy link
Copy Markdown
Contributor Author

lexming commented Mar 13, 2026

@verdurin loading setuptools-rust is the wrong solution, typing_extensions is already provided by Python-3.13.1-GCCcore-14.2.0.eb in 2025a. So, I guess that you either have a custom Python in your install without typing_extensions, or you need to create your venv with --system-site-packages to properly pick the extensions in the modules.

@verdurin
Copy link
Copy Markdown
Member

Can confirm that works.

@backelj
Copy link
Copy Markdown
Contributor

backelj commented Mar 20, 2026

@lexming these PyTorch easyconfigs are missing extra path settings:

modextrapaths = {'CMAKE_PREFIX_PATH': 'lib/python%(pyshortver)s/site-packages/torch', 'LD_LIBRARY_PATH' : 'lib/python%(pyshortver)s/site-packages/torch/lib', 'LIBRARY_PATH' : 'lib/python%(pyshortver)s/site-packages/torch/lib'}

These were added by the 'original' PyTorch (which are using the pytorch easyblock), but not by the PythonBundle.

@lexming
Copy link
Copy Markdown
Contributor Author

lexming commented Mar 25, 2026

@backelj we could indeed make those libs findable through search paths, that's probably a harmless change. However, software that builds on top of PyTorch usually uses PyTorch building tools which can automatically provide the paths to those libs. On my side, I already installed a bunch of easyconfigs on top of this without issue. Can you tell me what package failed to build for you?

@backelj
Copy link
Copy Markdown
Contributor

backelj commented Mar 25, 2026

@backelj Can you tell me what package failed to build for you?

I encountered the issue when trying to build OpenMM-Torch-1.5.1-foss-2025a.eb, see commit 5f771fc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025a issues & PRs related to 2025a common toolchains update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants