Skip to content

Refactor do_build in build_prod_wheels.py for reviewability#3389

Merged
subodh-dubey-amd merged 3 commits into
mainfrom
users/subodh-dubey-amd/refactor-do-build-wheels-core
Feb 18, 2026
Merged

Refactor do_build in build_prod_wheels.py for reviewability#3389
subodh-dubey-amd merged 3 commits into
mainfrom
users/subodh-dubey-amd/refactor-do-build-wheels-core

Conversation

@subodh-dubey-amd
Copy link
Copy Markdown
Contributor

@subodh-dubey-amd subodh-dubey-amd commented Feb 12, 2026

Motivation

Preparatory refactor for sccache integration (PR #3171). Addresses reviewer feedback on build_prod_wheels.py being hard to review due to a single large do_build() function.

Technical Details

  • Extract core build steps (env setup, Triton, PyTorch, Audio, Vision, Apex, ccache stats) from do_build() into new _do_build_wheels_core() helper.
  • do_build() now handles only setup/orchestration and delegates to the helper.
  • Replace two redundant get_rocm_path("root") calls with the rocm_dir parameter.
  • Pure refactor — no new args, no sccache logic, no behavioral changes.

Test Result

No functional changes — refactored code follows the same execution path as before.

Submission Checklist

@subodh-dubey-amd subodh-dubey-amd changed the title **Refactor do_build in build_prod_wheels.py for reviewability** Refactor do_build in build_prod_wheels.py for reviewability Feb 12, 2026
@subodh-dubey-amd subodh-dubey-amd marked this pull request as ready for review February 12, 2026 12:17
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +536 to +537
"PYTORCH_ROCM_ARCH": pytorch_rocm_arch,
"USE_KINETO": os.environ.get("USE_KINETO", "ON" if not is_windows else "OFF"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I don't see either of these variables referenced in https://github.com/pytorch/vision/ or https://github.com/pytorch/audio/, but including them in the env used for all builds is probably harmless.

We could try moving some of these env vars out of "common" and into do_build_pytorch around here:

env["USE_ROCM"] = "ON"
env["USE_CUDA"] = "OFF"
env["USE_MPI"] = "OFF"
env["USE_NUMA"] = "OFF"
env["PYTORCH_BUILD_VERSION"] = pytorch_build_version
env["PYTORCH_BUILD_NUMBER"] = args.pytorch_build_number

(fine to keep this PR as a pure code move, without changing behavior, then attempt that later)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — both are pytorch-only but PYTORCH_ROCM_ARCH is read from env in do_build_pytorch() for flash attention checks, so moving it needs a param change. Keeping as-is for pure code move; tracked in #3425.

Comment on lines +531 to +535
env: dict[str, str] = {
"PYTHONUTF8": "1", # Some build files use utf8 characters, force IO encoding
"CMAKE_PREFIX_PATH": str(cmake_prefix),
"ROCM_HOME": str(rocm_dir),
"ROCM_PATH": str(rocm_dir),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line is pretty blurry here for what env setup goes in do_build() or _do_build_wheels_core(). Maybe we could add a dedicated setup_common_build_env() function that would contain all of this code?

env: dict[str, str] = {
"PYTHONUTF8": "1", # Some build files use utf8 characters, force IO encoding
"CMAKE_PREFIX_PATH": str(cmake_prefix),
"ROCM_HOME": str(rocm_dir),
"ROCM_PATH": str(rocm_dir),
"PYTORCH_ROCM_ARCH": pytorch_rocm_arch,
"USE_KINETO": os.environ.get("USE_KINETO", "ON" if not is_windows else "OFF"),
}
if args.use_ccache:
print("Building with ccache, clearing stats first")
env["CMAKE_C_COMPILER_LAUNCHER"] = "ccache"
env["CMAKE_CXX_COMPILER_LAUNCHER"] = "ccache"
run_command(["ccache", "--zero-stats"], cwd=tempfile.gettempdir())
# GLOO enabled for only Linux
if not is_windows:
env["USE_GLOO"] = "ON"
# At checkout, we compute some additional env vars that influence the way that
# the wheel is named/versioned.
if triton_dir:
triton_env_file = triton_dir / "build_env.json"
if triton_env_file.exists():
with open(triton_env_file, "r") as f:
addl_triton_env = json.load(f)
print(f"-- Additional triton build env vars: {addl_triton_env}")
env.update(addl_triton_env)
# With `CMAKE_PREFIX_PATH` set, `find_package(LLVM)` (called in
# `MLIRConfig.cmake` shipped as part of the LLVM bundled with
# trition) may pick up TheRock's LLVM instead of triton's.
# Here, `CMAKE_FIND_USE_CMAKE_ENVIRONMENT_PATH` is set
# and passed via `TRITON_APPEND_CMAKE_ARGS` to avoid this.
# See also https://github.com/ROCm/TheRock/issues/1999.
env["TRITON_APPEND_CMAKE_ARGS"] = (
"-DCMAKE_FIND_USE_CMAKE_ENVIRONMENT_PATH=FALSE"
)
if is_windows:
llvm_dir = rocm_dir / "lib" / "llvm" / "bin"
env.update(
{
"HIP_CLANG_PATH": str(llvm_dir.resolve().as_posix()),
"CC": str((llvm_dir / "clang-cl.exe").resolve()),
"CXX": str((llvm_dir / "clang-cl.exe").resolve()),
}
)
else:
env.update(
{
# Workaround GCC12 compiler flags.
"CXXFLAGS": " -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict ",
"CPPFLAGS": " -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict ",
}
)
# Workaround missing devicelib bitcode
# TODO: When "ROCM_PATH" and/or "ROCM_HOME" is set in the environment, the
# clang frontend ignores its default heuristics and (depending on version)
# finds the wrong path to the device library. This is bad/annoying. But
# the PyTorch build shouldn't even need these to be set. Unfortunately, it
# has been hardcoded for a long time. So we use a clang env var to force
# a specific device lib path to workaround the hack to get pytorch to build.
# This may or may not only affect the Python wheels with their own quirks
# on directory layout.
# Obviously, this should be completely burned with fire once the root causes
# are eliminted.
hip_device_lib_path = get_rocm_path("root") / "lib" / "llvm" / "amdgcn" / "bitcode"
if not hip_device_lib_path.exists():
print(
"WARNING: Default location of device libs not found. Relying on "
"clang heuristics which are known to be buggy in this configuration"
)
else:
env["HIP_DEVICE_LIB_PATH"] = str(hip_device_lib_path)
# OpenBLAS path setup
host_math_path = get_rocm_path("root") / "lib" / "host-math"
if not host_math_path.exists():
print(
"WARNING: Default location of host-math not found. "
"Will not build with OpenBLAS support."
)
else:
env["BLAS"] = "OpenBLAS"
env["OpenBLAS_HOME"] = str(host_math_path)
env["OpenBLAS_LIB_NAME"] = "rocm-openblas"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Extracted all env construction into a dedicated _setup_common_build_env() function.

@ScottTodd
Copy link
Copy Markdown
Member

Code changes seem fine but I want to wait until #3432 clears and we get a clean CI run before approving. See failures at https://github.com/ROCm/TheRock/actions/runs/22062019834/job/63760343692?pr=3389.

@ScottTodd
Copy link
Copy Markdown
Member

PyTorch builds are working again after 8940752. Can you sync the branch on this PR so checks run again with the latest code? (Or I can click the "update branch" button myself)

Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subodh-dubey-amd subodh-dubey-amd merged commit fcf8f7f into main Feb 18, 2026
106 of 110 checks passed
@subodh-dubey-amd subodh-dubey-amd deleted the users/subodh-dubey-amd/refactor-do-build-wheels-core branch February 18, 2026 06:48
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Feb 18, 2026
jammm pushed a commit that referenced this pull request Feb 19, 2026
## Motivation

Preparatory refactor for sccache integration ([PR
#3171](#3171 (comment))).
Addresses [reviewer
feedback](#3171 (comment))
on `build_prod_wheels.py` being hard to review due to a single large
`do_build()` function.

## Technical Details

- Extract core build steps (env setup, Triton, PyTorch, Audio, Vision,
Apex, ccache stats) from `do_build()` into new `_do_build_wheels_core()`
helper.
- `do_build()` now handles only setup/orchestration and delegates to the
helper.
- Replace two redundant `get_rocm_path("root")` calls with the
`rocm_dir` parameter.
- **Pure refactor** — no new args, no sccache logic, no behavioral
changes.

## Test Result
No functional changes — refactored code follows the same execution path
as before.
- https://github.com/ROCm/TheRock/actions/runs/21945223080
After dedicated `_setup_common_build_env()` function:
- https://github.com/ROCm/TheRock/actions/runs/22062404175

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
subodh-dubey-amd added a commit that referenced this pull request Feb 20, 2026
## Motivation

Add sccache support to PyTorch wheel builds for S3-backed distributed
caching. Script placed in `build_tools/` per [reviewer
feedback](#3171 (comment)),
modeled after `build_tools/setup_ccache.py`.

Part of sccache PR sequence:
[#3369](#3369) →
[#3389](#3389) → **this** → workflow
wiring.

## Technical Details

- **New: `build_tools/setup_sccache_rocm.py`** — generic sccache ROCm
helper (CLI + importable):
  - `find_sccache()` — locate binary; hard fail if missing
- `setup_rocm_sccache()` — wrap clang/clang++ with sccache stubs (Linux
only)
  - `restore_rocm_compilers()` — undo wrapping
  
- **Modified: `external-builds/pytorch/build_prod_wheels.py`**:
  - `--use-ccache` / `--use-sccache` mutually exclusive args
- Both hard-fail with `RuntimeError` if the requested cache tool is not
found ([per
review](#3171 (comment)))
— no silent fallback
- Added explicit ccache availability check (previously would fail with
an unclear subprocess error)
- sccache: wrap compilers → set CMAKE launchers → `try`/`finally` around
build for guaranteed compiler restore + stats
- Moved ccache stats reporting into `finally` block for consistent
reporting on both success and failure

## Test Result

No workflow changes — sccache wired but not yet invoked by CI (next PR
adds `cache_type` input + AWS config).

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
subodh-dubey-amd added a commit that referenced this pull request Mar 5, 2026
## Summary

Adds `sccache` with S3 remote storage to all four PyTorch wheel build
workflows, significantly reducing build times through distributed
compiler caching.

**PR sequence:** #3369#3306#3389#3482 → **this** → #3189
([based on Reviewer's
Feedback](#3171 (comment)))

## How It Works

| | Linux | Windows |
|---|---|---|
| **Host C/C++** | CMake compiler launchers | CMake compiler launchers |
| **HIP device code** | Wraps ROCm `clang`/`clang++` with sccache | Not
supported |
| **Cleanup** | Restores original compilers via try/finally | N/A |

Cache is stored in the `therock-<workflow>-pytorch-sccache` S3 bucket,
keyed by `<os>/<arch>/` prefix.

## S3 Cache Configuration

Each workflow uses a dedicated S3 bucket and IAM role, keyed by
`<os>/<arch>/` prefix:

| Workflow | S3 Bucket | IAM Role |
|----------|-----------|----------|
| Linux CI | `therock-ci-pytorch-sccache` | `therock-ci` |
| Windows CI | `therock-ci-pytorch-sccache` | `therock-ci` |
| Linux Release | `therock-{release_type}-pytorch-sccache` |
`therock-{release_type}` |
| Windows Release | `therock-{release_type}-pytorch-sccache` |
`therock-{release_type}` |

Where `release_type` is one of: `dev`, `nightly`, `prerelease`.

##  Impact

| Platform | Cold → Warm | Improvement |
|----------|------------|-------------|
| Linux | ~70m → ~37m | **~49%** |
| Windows | ~42m → ~26m | **~38%** |

Windows is lower — sccache cannot wrap HIP device compilation on
Windows, only host C/C++ via CMAKE launchers.

## Tests

### Linux:
- [Linux (Cache
Population)](https://github.com/ROCm/TheRock/actions/runs/22226347964/job/64293924748)
- 70 mins
- [Linux (Cache
Hit)](https://github.com/ROCm/TheRock/actions/runs/22231743387/job/64312966557)
- 37 mins

### Windows:
- [Windows (Cache
Population)](https://github.com/ROCm/TheRock/actions/runs/22219252671/job/64280583887)
- 42 mins
- [Windows (Cache
Hit)](https://github.com/ROCm/TheRock/actions/runs/22223608689/job/64284721704)
- 26 mins

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.'



> Forks: S3 caching is only active for ROCm-owned runs. Fork users can
set cache_type to ccache or none, or leave the default — sccache will
work locally without S3 access.

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants