Skip to content
Merged
16 changes: 12 additions & 4 deletions .github/workflows/test_pytorch_wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,6 @@ jobs:
shell: bash
env:
VENV_DIR: ${{ github.workspace }}/.venv
TORCH_VERSION: ${{ inputs.torch_version }}
AMDGPU_FAMILY: ${{ inputs.amdgpu_family }}

steps:
Expand All @@ -94,6 +93,15 @@ jobs:
with:
python-version: ${{ inputs.python_version }}

# TODO: also upload and reference test report together with this logging?
- name: Summarize workflow inputs
run: |
python build_tools/github_actions/summarize_test_pytorch_workflow.py \
--torch-version=${{ inputs.torch_version }} \
--pytorch-git-ref=${{ inputs.pytorch_git_ref }} \
--index-url=${{ inputs.package_index_url }} \
--index-subdir=${{ inputs.amdgpu_family }}
Comment thread
ScottTodd marked this conversation as resolved.

# Here we checkout the same version of PyTorch that wheels were built from
# so we have the right set of test source files. We _probably_ don't need
# to run HIPIFY or apply any patches, so we skip those steps to save time.
Expand All @@ -113,9 +121,9 @@ jobs:
- name: Set up virtual environment
run: |
python build_tools/setup_venv.py ${VENV_DIR} \
--packages torch==${TORCH_VERSION} \
--index-url ${{ inputs.package_index_url }} \
--index-subdir ${{ inputs.amdgpu_family }} \
--packages torch==${{ inputs.torch_version }} \
--index-url=${{ inputs.package_index_url }} \
--index-subdir=${{ inputs.amdgpu_family }} \
--activate-in-future-github-actions-steps

- name: Install test requirements
Expand Down
112 changes: 112 additions & 0 deletions build_tools/github_actions/summarize_test_pytorch_workflow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
#!/usr/bin/env python3

"""
This summarizes the environment setup steps for the
.github/workflows/test_pytorch_wheels.yml workflow.
Comment on lines +3 to +5
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. i am not sure if we want to have it running all the time in the pipeline?

wouldnt it be less noise if we have a markdown "here are the steps to do and export the variables from the CI run" and then have in the CI just print

export AMDGPU_FAMILY="gfx1151"
export TORCH_VERSION="2.7.1+rocm7.10.0a20251120"
export PYTORCH_GIT_REF="release/2.7"

I hear that, yeah. I think for many contributors who aren't as familiar with each CI pipeline, having a nicely formatted summary will make workflow results easier to understand. I know where in the logs to look for reproduction steps across each workflow type, but many developers do not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments some discussions above: maybe have a generic file in the doc/ how to setup docker for pytorch tests and then in the ci have a referrence to that doc as URL + the pytorch test command

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a new attempt: https://gist.github.com/ScottTodd/6a465a4958fdaea59ede417434ba64b4#file-v4-md, which I'll write the code for now.

The output will be in this format:


PyTorch Test Report

To reproduce, see Running/testing PyTorch and setup with:

# Fetch pytorch source files, including tests:
git clone --branch release/2.7 --origin rocm https://github.com/ROCm/pytorch.git

# Install torch and test requirements
pip install ^
  --index-url=https://rocm.nightlies.amd.com/v2-staging/gfx110X-dgpu ^
  torch==2.7.1+rocm7.10.0a20251120
pip install -r pytorch/.ci/docker/requirements-ci.txt


It is intended to be run from within that workflow and writes markdown to the
GITHUB_STEP_SUMMARY file.

The script can be tested locally with inputs like this:

python ./build_tools/github_actions/summarize_test_pytorch_workflow.py \
--pytorch-git-ref=release/2.7 \
--index-url=https://rocm.nightlies.amd.com/v2-staging \
--index-subdir=gfx110X-dgpu \
--torch-version=2.7.1+rocm7.10.0a20251120
"""

import argparse
import os
import platform

from github_actions_utils import *


def is_windows() -> bool:
return platform.system() == "Windows"


LINE_CONTINUATION_CHAR = "^" if is_windows() else "\\"
LINE_CONTINUATION = f" {LINE_CONTINUATION_CHAR}\n "


def run(args: argparse.Namespace):
index_url = f"{args.index_url}/{args.index_subdir}/"
pytorch_repo_org = "pytorch" if args.pytorch_git_ref == "nightly" else "ROCm"
pytorch_origin_args = "" if args.pytorch_git_ref == "nightly" else "--origin rocm"
pytorch_remote_url = f"https://github.com/{pytorch_repo_org}/pytorch.git"
pytorch_web_url = f"https://github.com/{pytorch_repo_org}/pytorch"
pytorch_web_url_with_branch = f"{pytorch_web_url}/tree/{args.pytorch_git_ref}"

# This report should be as brief as possible while still conveying what
# is unique to the given arguments.

summary = ""
summary += "## PyTorch Test Report\n\n"

# Summary information.
summary += f"* Torch version: `{args.torch_version}`\n"
summary += f"* Python version: `{args.python_version}`\n"
summary += f"* GPU family: `{args.index_subdir}`\n"
summary += f"* Package index: {index_url}/\n"
summary += f"* PyTorch source code: {pytorch_web_url_with_branch}\n"

# Link to detailed documentation.
summary += "\n"
summary += "To reproduce, see [Running/testing PyTorch](https://github.com/ROCm/TheRock/tree/main/external-builds/pytorch#runningtesting-pytorch) and setup with:\n"
Comment thread
ScottTodd marked this conversation as resolved.

# Simple to copy/paste instructions to get the code and packages.
summary += "\n"
summary += "```bash\n"
summary += "# Fetch pytorch source files, including tests:\n"
summary += f"git clone --branch {args.pytorch_git_ref} {pytorch_origin_args} {pytorch_remote_url}\n"
summary += "\n"
summary += "# Install torch and test requirements\n"
summary += "pip install" + LINE_CONTINUATION
summary += f"--index-url={index_url}" + LINE_CONTINUATION
summary += "torch"
summary += f"=={args.torch_version}" if args.torch_version else ""
summary += "\n"
summary += "pip install -r pytorch/.ci/docker/requirements-ci.txt\n"
summary += "```\n\n"

gha_append_step_summary(summary)
Comment thread
ScottTodd marked this conversation as resolved.


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Summarize test pytorch")
parser.add_argument(
"--torch-version",
type=str,
help="torch package version to install (e.g. '2.7.1+rocm7.10.0a20251120'), or empty for latest",
)
parser.add_argument(
"--python-version",
type=str,
default=f"{sys.version_info[0]}.{sys.version_info[1]}",
help="Python version to used for tests (defaults to sys.version as X.Y)",
)
parser.add_argument(
"--pytorch-git-ref",
type=str,
default="nightly",
help="PyTorch ref to checkout test sources from",
)
parser.add_argument(
"--index-url",
type=str,
default="https://rocm.nightlies.amd.com/v2-staging",
help="Full URL for a release index to use with 'pip install --index-url='",
)
# TODO: default the index subdir based on the current GPU somehow?
# (share that logic with setup_venv.py if so)
parser.add_argument(
"--index-subdir",
Comment thread
ScottTodd marked this conversation as resolved.
type=str,
required=True,
help="Index subdirectory (e.g. gfx110X-dgpu)",
)
args = parser.parse_args()

run(args)
38 changes: 36 additions & 2 deletions external-builds/pytorch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,20 @@ mix/match build steps.

## Running/testing PyTorch

### Prerequisites

On Linux we run automated tests under our
[`no_rocm_image_ubuntu24_04.Dockerfile`](dockerfiles/no_rocm_image_ubuntu24_04.Dockerfile)
container. Docker is optional for developers and users. If you want to use our
test image, run it like so:

```bash
sudo docker run -it \
--device=/dev/kfd --device=/dev/dri \
--ipc=host --group-add=video --group-add=render --group-add=110 \
ghcr.io/rocm/no_rocm_image_ubuntu24_04:latest
```

### Running ROCm and PyTorch sanity checks

The simplest tests for a working PyTorch with ROCm install are:
Expand All @@ -196,9 +210,29 @@ pytest -v smoke-tests

### Running full PyTorch tests

See https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/3rd-party/pytorch-install.html#testing-the-pytorch-installation
We have a [`run_linux_pytorch_tests.py`](run_linux_pytorch_tests.py) script
which runs PyTorch unit tests using pytest with additional test exclusion
capabilities tailored for AMD ROCm GPUs. See the script for detailed
instructions. Here are a few examples:

```bash
# Basic usage (auto-detect everything):
python run_linux_pytorch_tests.py

<!-- TODO(erman-gurses): update docs here -->
# Custom test selection with pytest -k:
python run_linux_pytorch_tests.py -k "test_nn and not test_dropout"

# Explicit pytorch repo path (for test sources) and GPU family (for filtering)
python run_linux_pytorch_tests.py --pytorch-dir=/tmp/pytorch --amdgpu-family=gfx950
```

Tests can also be run by following the ROCm documentation at
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/3rd-party/pytorch-install.html#testing-the-pytorch-installation.
For example:

```bash
PYTORCH_TEST_WITH_ROCM=1 python pytorch/test/run_test.py --include test_torch
```

## Nightly releases

Expand Down
Loading