Skip to content

Add sccache support for PyTorch builds#3171

Closed
subodh-dubey-amd wants to merge 26 commits into
mainfrom
users/subodh-dubey-amd/ccache-pytorch
Closed

Add sccache support for PyTorch builds#3171
subodh-dubey-amd wants to merge 26 commits into
mainfrom
users/subodh-dubey-amd/ccache-pytorch

Conversation

@subodh-dubey-amd
Copy link
Copy Markdown
Contributor

@subodh-dubey-amd subodh-dubey-amd commented Jan 30, 2026

Summary

Adds sccache with AWS S3 remote storage for PyTorch wheel builds, significantly reducing build times through distributed compiler caching.

Key Features

  • S3-backed remote cache: Shared cache across CI runs using therock-pytorch-sccache bucket
  • Platform-specific cache keys: Organized by linux/<arch>/ and windows/<arch>/ prefixes
  • ROCm compiler wrapping (Linux): Wraps clang/clang++ in ROCm SDK with sccache for HIP compilation caching
  • CMake launcher integration: Uses CMAKE_C_COMPILER_LAUNCHER and CMAKE_CXX_COMPILER_LAUNCHER for host code caching
  • Automatic cleanup: Restores original compilers after build via try/finally block
  • Robust error handling: Safe wrapper creation with atomic operations and rollback on failure

How It Works

Linux

  1. Downloads and installs sccache binary
  2. Wraps ROCm's clang/clang++ with sccache wrapper scripts
  3. Sets CMake compiler launchers for host code
  4. Caches compilation artifacts to S3
  5. Restores original compilers on completion

Windows

  1. Downloads and installs sccache.exe
  2. Sets CMake compiler launchers for C/C++ host code caching
  3. Caches compilation artifacts to S3

Configuration

Environment variables (set in workflow):

  • SCCACHE_BUCKET: S3 bucket name
  • SCCACHE_REGION: AWS region
  • SCCACHE_S3_KEY_PREFIX: Cache key prefix (os/arch)
  • SCCACHE_S3_SERVER_SIDE_ENCRYPTION: Enabled
  • SCCACHE_LOG: Set to warn for error/warning visibility

Files Changed

  • .github/workflows/build_portable_linux_pytorch_wheels.yml - Linux workflow with sccache config
  • .github/workflows/build_windows_pytorch_wheels.yml - Windows workflow with sccache config
  • external-builds/pytorch/build_prod_wheels.py - Build script with sccache integration
  • external-builds/pytorch/setup_sccache_rocm.py - New module for sccache setup and compiler wrapping

Testing

  • Linux release builds (gfx110X)
  • Linux nightly builds
  • Windows release builds (gfx110X)
  • Windows nightly - blocked by ROCm SDK 7.12.0 issue (unrelated to this PR)

Known Limitations

  1. Windows HIP device code: Not cached (sccache doesn't support HIP compiler launcher on Windows)
  2. Windows nightly: Failing due to ROCm SDK 7.12.0 bug with HIP compiler detection (CMake passes MSVC linker flags to GNU-like compiler) - this is a pre-existing infrastructure issue, not caused by this PR as in 30/01/2026
    https://github.com/ROCm/TheRock/actions/runs/21508193319

Run 1 ( Cache Population )

Run 2 ( Cache Hit )

Linux PyTorch Build Times

Release Run 1 (Cache Population) Run 2 (Cache Hit) Time Saved Improvement
release/2.7 40m 22m 18m 45%
release/2.8 50-51m 26-27m 24m 48%
release/2.9 48-49m 23-24m 25m 52%
release/2.10 52-53m 27-29m 24m 47%
nightly 53-54m 28m 25m 47%

Linux Average: ~48% improvement


Windows PyTorch Build Times

Release Run 1 (Cache Population) Run 2 (Cache Hit) Time Saved Improvement
release/2.9 71-79m 58-59m 15m 19%
release/2.10 64-72m 53-59m 11m 16%
nightly ❌ Failed ❌ Failed - -

Windows Average: ~17% improvement (release builds only)


Summary: Build Time Improvements

Platform Cache Population Cache Hit Improvement
Linux ~50m avg ~25m avg ~48%
Windows ~72m avg ~57m avg ~17%

Times vary based on cache hit rate and code changes

Submission Checklist

…ncatenation instead of Path() for relative links
lld doesn't work with mixed GCC/Clang builds - Triton uses GCC which
doesn't support -fuse-ld=/path/to/lld syntax. Only Clang supports full
path linker specification.
… error handling and improve binary management.
…ws handling for ROCm builds; remove HIP compiler launcher due to compatibility issues.
@subodh-dubey-amd subodh-dubey-amd marked this pull request as ready for review January 30, 2026 12:55
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I think this is heading in a good direction.

Comment on lines +151 to +156
- name: Configure AWS Credentials for sccache
if: ${{ github.repository_owner == 'ROCm' }}
uses: aws-actions/configure-aws-credentials@61815dcd50bd041e203e49132bacad1fd04d2708 # v5.1.1
with:
aws-region: us-east-2
role-to-assume: arn:aws:iam::692859939525:role/therock-${{ inputs.release_type }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the same roles as we use for uploading release files (python packages, artifacts). Do we want a separate role for using sccache?

Is the therock-pytorch-sccache bucket public read but private write, or private for both?

cc @marbre

Copy link
Copy Markdown
Contributor Author

@subodh-dubey-amd subodh-dubey-amd Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bucket is private and blocking all the public access but only accessible throught these role only role-to-assume: arn:aws:iam::692859939525:role/therock-${{ inputs.release_type }}

Using the existing release role because:
The sccache operations (S3 GetObject/PutObject) are a subset of what the arn:aws:iam::692859939525:role/therock-${{ inputs.release_type } roles already have

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need @marbre and @amd-shiraz to weigh in here (and perhaps @amd-justchen too, given his prior work on the ccache server we use for building ROCm).

We need a clear policy written down for how cache buckets and access is handled. On other projects we've made these decisions:

  • CI cache buckets are world readable so developers can benefit from the CI cache
  • workflows running on schedule or push can read and write to the cache
  • workflows running on pull_request can only read from the cache
  • (what about workflow_dispatch?)

I'd like to apply the same policies for PyTorch and ROCm builds, so we aren't dealing with an explosion of different settings when we also enable caching for JAX and other projects.

Note that I also have #3303 open which creates a new workflow for building pytorch on CI. That will be the main place that a build cache will be needed. Having a cache for dev or nightly release builds is more of a nice-to-have given the reduced job frequency and lower bar for build cache integrity.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also another reminder: keep PRs small and focused. This takes weeks to review because the change does multiple things at once, and each related piece has open design questions.

PR sequence:

  1. Add sccache to dockerfiles
  2. Set workflows to use new dockerfiles
  3. Add sccache support to build scripts
  4. Have workflows use the new sccache support

Each of those would have significantly shorter review turnaround time.

Comment on lines +158 to +161
- name: Install sccache
run: |
pip install sccache
sccache --version
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get sccache into our build dockerfile, similar to ccache:

I don't trust a pip install in this workflow prior to the two steps below that select a python version and put that python version on PATH.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should get this into our base image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added sccache installation to the Docker image via dockerfiles/install_sccache.sh (similar pattern to ccache installation).

S3_BUCKET_PY: "therock-${{ inputs.release_type }}-python"
optional_build_prod_arguments: ""
# sccache configuration for ROCm compiler caching with S3 backend
SCCACHE_BUCKET: therock-pytorch-sccache
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want separate cache buckets (or namespaces) for dev, nightly, and stable releases.

Copy link
Copy Markdown
Contributor Author

@subodh-dubey-amd subodh-dubey-amd Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use separate buckets per environment:

  • therock-dev-pytorch-sccache
  • therock-nightly-pytorch-sccache
  • therock-prerelease-pytorch-sccache

Each environment's IAM role (therock-dev, therock-nightly, therock-prerelease) has access only to its corresponding bucket.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check with @marbre for these bucket configurations and role settings. This TheRock repository will retain access to the "dev" role but nightly and prerelease are moving to https://github.com/ROCm/rockrel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marbre
Each environment's IAM role (therock-dev, therock-nightly, therock-prerelease) has access only to its corresponding bucket.

  • therock-dev-pytorch-sccache
  • therock-nightly-pytorch-sccache
  • therock-prerelease-pytorch-sccache

Attached the dev role policy screenshot. Do we need any changes here ?

Comment on lines +441 to +453
except Exception as e:
print(f"ERROR: sccache setup failed: {e}")
print("Falling back to ccache for host code compilation...")
args.use_sccache = False
args.use_ccache = True
env["CMAKE_C_COMPILER_LAUNCHER"] = "ccache"
env["CMAKE_CXX_COMPILER_LAUNCHER"] = "ccache"
try:
run_command(["ccache", "--zero-stats"], cwd=tempfile.gettempdir())
except Exception as ccache_error:
print(f"WARNING: ccache fallback also failed: {ccache_error}")
print("Continuing without compiler caching...")
args.use_ccache = False
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diffs in this file are difficult to review due to the changes to indentation to accomodate more exception handling. It might help to first pull some of these sections into functions in one PR/commit and then have another PR/commit wrap with sccache setup.

Comment on lines +441 to +445
except Exception as e:
print(f"ERROR: sccache setup failed: {e}")
print("Falling back to ccache for host code compilation...")
args.use_sccache = False
args.use_ccache = True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we respect the user's choice here and hard fail instead of falling back to something the user didn't request.

  • If --use-sccache is set but sccache couldn't be set up for some reason, fail.
  • Same for --use-ccache

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we proceed with out any cache settings we dont find the sccache instead of falling back to ccache

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always prefer visible errors - fail fast: https://github.com/ROCm/TheRock/blob/main/docs/development/style_guides/python_style_guide.md#fail-fast-behavior. We don't want to discover that we've been running for months without a functional cache due to an environment configuration issue that trips the fallback path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed to hard fail - now raises RuntimeError.

To address Akash's comment, if we want a way to build without cache, Introduced cache_type input for both Linux and Windows workflows to specify the compiler cache type (sccache, ccache, or none).

Comment on lines 1087 to 1096
build_p.add_argument(
"--use-ccache",
action=argparse.BooleanOptionalAction,
help="Use ccache as the compiler launcher",
help="Use ccache as the compiler launcher (for host code only)",
)
build_p.add_argument(
"--use-sccache",
action=argparse.BooleanOptionalAction,
help="Use sccache with ROCm compiler wrapping (comprehensive caching for HIP code)",
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make --use-ccache and --use-sccache mututally exclusive.

https://docs.python.org/3/library/argparse.html#mutual-exclusion

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Updated to use argparse.add_mutually_exclusive_group()

Comment on lines +405 to +408
def main():
parser = argparse.ArgumentParser(
description="Setup sccache to wrap ROCm compilers for PyTorch builds"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any references to "torch" in this file outside of comments. Can we either

  1. Use scripts provided by pytorch itself
  2. Move this to build_tools/ and share with multiple project builds. We can model the file after https://github.com/ROCm/TheRock/blob/main/build_tools/setup_ccache.py

Comment on lines +246 to +251
- name: Report sccache stats
if: ${{ !cancelled() }}
run: |
echo "sccache Stats:"
echo "--------------"
sccache --show-stats || true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay for now, but relating to my other comment about making the sccache setup script more generic (and not specific to pytorch), we have a common pattern for "setup cache" and "report cache stats".

See how build_tools/health_status.py is run here:

# TODO: We shouldn't be using a cache on actual release branches, but it
# really helps for iteration time.
- name: Setup ccache
run: |
./build_tools/setup_ccache.py \
--config-preset "github-oss-presubmit" \
--dir "$(dirname $CCACHE_CONFIGPATH)" \
--local-path "$CACHE_DIR/ccache"
- name: Runner health status
run: |
./build_tools/health_status.py

We could add sccache to the env check, like

  • class CheckCCache(CheckProgram):
    def __init__(self, required=False):
    super().__init__(required)
    self.program = FindCCache()
    self.name = "CCache"
  • class FindCCache(FindProgram):
    def __init__(self):
    super().__init__()
    self.name = "ccache"
    self.get_version()
  • def device_ccache_system(self):
    """
    Returns a pair of string lists that contain information about the ccache on
    the system. If ccache is not installed, strings stating this are returned.
    CCACHE_STAT (= [0]) contains general status about ccache
    CCACHE_CONFIG ( = [1]) contains the ccache config
    """
    ccache = []
    try:
    proc = subprocess.run(
    ["ccache", "-s", "-v"], capture_output=True, text=True, check=True
    )
    ccache.append([proc.stdout.splitlines()])
    except (subprocess.CalledProcessError, FileNotFoundError):
    ccache.append(["Ccache not detected!"])
    ccache.append([""])
    return ccache
    try:
    proc = subprocess.run(
    ["ccache", "--show-config"], capture_output=True, text=True, check=True
    )
    ccache.append([proc.stdout.splitlines()])
    except (subprocess.CalledProcessError, FileNotFoundError):
    ccache.append([""])
    return ccache

(quite a lot of boilerplate that way though...)

Then, on the post-build side of the workflows, we have this code now that could be moved to a similar script:

- name: Report
if: ${{ !cancelled() }}
shell: bash
run: |
if [ -d "./build" ]; then
echo "Full SDK du:"
echo "------------"
du -h -d 1 build/dist/rocm
echo "Artifact Archives:"
echo "------------------"
ls -lh build/artifacts/*.tar.xz
echo "Artifacts:"
echo "----------"
du -h -d 1 build/artifacts
echo "CCache Stats:"
echo "-------------"
ccache -s -v
tail -v -n +1 .ccache/compiler_check_cache/* > build/logs/ccache_compiler_check_cache.log
else
echo "[ERROR] Build directory ./build does not exist. Skipping report!"
echo " This should only happen if the CI is cancelled before the build step."
exit 1
fi

Copy link
Copy Markdown
Contributor Author

@subodh-dubey-amd subodh-dubey-amd Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. Keeping the inline approach for now to limit scope. Created #3189 to track adding sccache to env_check tooling and unifying cache stats reporting as a follow-up.

Comment on lines +64 to +111
def install_sccache() -> Path:
"""Install sccache if not available."""
sccache_path = find_sccache()
if sccache_path:
print(f"Found sccache at: {sccache_path}")
return sccache_path

print("sccache not found, attempting to install...")

if is_windows:
# Try cargo install
try:
subprocess.check_call(["cargo", "install", "sccache"])
sccache_path = Path.home() / ".cargo" / "bin" / "sccache.exe"
if sccache_path.exists():
return sccache_path
except (subprocess.CalledProcessError, FileNotFoundError):
pass

raise RuntimeError(
"Could not install sccache. Please install it manually:\n"
" choco install sccache\n"
" or: cargo install sccache"
)
else:
# Try pip install (sccache is available on PyPI)
try:
subprocess.check_call([sys.executable, "-m", "pip", "install", "sccache"])
sccache_path = find_sccache()
if sccache_path:
return sccache_path
except subprocess.CalledProcessError:
pass

# Try cargo install as fallback
try:
subprocess.check_call(["cargo", "install", "sccache"])
sccache_path = Path.home() / ".cargo" / "bin" / "sccache"
if sccache_path.exists():
return sccache_path
except (subprocess.CalledProcessError, FileNotFoundError):
pass

raise RuntimeError(
"Could not install sccache. Please install it manually:\n"
" pip install sccache\n"
" or: cargo install sccache"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this script should do any installing on its own. Our other scripts don't do that, and we should have

  1. Predictable tool installs in our base build environments
  2. Script that fail if the environment is not configured as expected

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed the install_sccache() function. The script now:

  1. Uses find_sccache() to locate the binary
  2. Fails with RuntimeError if sccache is not found

sccache is now pre-installed via:

  • Linux: Docker image (install_sccache.sh)
  • Windows: choco install sccache in workflow

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a 17% improvement of build time on Windows is interesting 🤔

In my local builds back in August I was able to get from 40-60 minutes down to 6 minutes with ccache.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i tried:

  • CMAKE_HIP_COMPILER_LAUNCHER=sccache → "Compiler not supported" error
  • HIP_CLANG_LAUNCHER=sccache → No improvement
  • Wrapper scripts (like Linux) → Doesn't work on Windows due to toolchain differences

Do you remember the ccache configuration from August? Specifically:

  • Any special environment variables or flags?
  • Local cache or remote storage?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local cache with just the --use-ccache option to this script, no extra tuning or settings. I didn't run detailed experiments at the time, but I posted as a footnote on pytorch/pytorch#159520 (comment)

By the way, on my machine with ccache, through those build scripts I'm seeing about 40-60 minutes for a cold cache build, 6 minutes on a clean build with 95.80% cache hits, and 1 minute on a rebuild (existing build directory + warm cache).

@ScottTodd ScottTodd requested a review from amd-shiraz January 30, 2026 18:05
@subodh-dubey-amd subodh-dubey-amd force-pushed the users/subodh-dubey-amd/ccache-pytorch branch from 8aa2314 to 9c95c75 Compare February 1, 2026 16:01
…Torch wheels to include sccache and Add TODO for SHA pinning after merge
…Torch wheels to a specific SHA and refine TODO for future updates
Comment on lines -106 to +120
image: ghcr.io/rocm/therock_build_manylinux_x86_64@sha256:db2b63f938941dde2abc80b734e64b45b9995a282896d513a0f3525d4591d6cb
# TODO(follow-up PR): Update SHA to main image after Dockerfile changes merge
image: ghcr.io/rocm/therock_build_manylinux_x86_64@sha256:6e7d49caefd37cdda93487bafde973a683f372d517ca7e5bbb4232ebdcfaca30
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequence these changes to the dockerfile as their own PRs, following these instructions: https://github.com/ROCm/TheRock/tree/main/dockerfiles#updating-images-used-by-github-actions-workflows

(I only have time to review PRs that are "ready", and this can't be ready by design - it could be marked as draft until the sequence of changes lands)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see my other comment) Move the Dockerfile changes to their own PR and land them first

https://github.com/ROCm/TheRock/tree/main/dockerfiles#updating-images-used-by-github-actions-workflows

@subodh-dubey-amd subodh-dubey-amd requested review from marbre and removed request for marbre February 11, 2026 07:16
@subodh-dubey-amd subodh-dubey-amd marked this pull request as draft February 11, 2026 13:50
ScottTodd added a commit that referenced this pull request Feb 11, 2026
…3303)

## Motivation

Progress on #3291.

This adds a new `build_portable_linux_pytorch_wheels_ci.yml` workflow
forked from
[`build_portable_linux_pytorch_wheels.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/build_portable_linux_pytorch_wheels.yml).
This new workflow is run as part of our CI pipeline and will help catch
when changes to ROCm break PyTorch source builds. Future work will
expand this to also build other packages, upload the built packages to
S3, and run tests.

This workflow code would have caught the build break reported at
#3042.

## Technical Details

> [!NOTE]
> See #3291 and
https://github.com/ScottTodd/claude-rocm-workspace/blob/main/tasks/active/pytorch-ci.md
for other design considerations.

I'm starting with a narrow scope here to provide _some_ value without
blowing our budget or delaying while we refactor related workflows and
infrastructure code (e.g. moving index page generation server-side,
generating commit manifests at the _start_ of workflows instead of
computing them after the fact and plumbing them through partway through
the jobs)

Specifics:

* Linux only (as a start)
* Non-configurable, always runs (as a start)
* Included for all GPU architectures where `expect_pytorch_failure` is
not set
* Python 3.12 (not full matrix)
* PyTorch release/2.10 branch (not full matrix)
* Only builds 'torch', not 'torchaudio', 'torchvision', 'triton', or
other packages
* Does not upload packages yet
* Does not run tests yet (beyond package sanity checks that `import
torch` works on the build machine)

The build jobs add about 30 minutes of CI time per GPU architecture, and
we are not currently using ccache or sccache
(#3171 will change that)

## Test Plan

* Tested on a known-broken commit
(4497f66)
*
https://github.com/ROCm/TheRock/actions/runs/21768200125/job/62810358116
(failed as expected)
* Test on a known-working commit
(a001047)
*
https://github.com/ROCm/TheRock/actions/runs/21768071862/job/62813030260
(passed as expected)
* CI jobs on this PR itself, e.g.
https://github.com/ROCm/TheRock/actions/runs/21846117572/job/63050058601?pr=3303
    ```

[41](https://github.com/ROCm/TheRock/actions/runs/21846117572/job/63049474316?pr=3303#step:11:78642)
Found built wheel:
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl
++ Copy
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl
-> /home/runner/_work/TheRock/TheRock/output/packages/dist
    +++ Installing built torch:
++ Exec [/tmp]$ /opt/python/cp312-cp312/bin/python -m pip install
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl
+++ Sanity checking installed torch (unavailable is okay on CPU
machines):
++ Capture [/tmp]$ /opt/python/cp312-cp312/bin/python -c 'import torch;
print(torch.cuda.is_available())'
    Sanity check output:
    False
    --- Not build pytorch-audio (no --pytorch-audio-dir)
    --- Not build pytorch-vision (no --pytorch-vision-dir)
    --- Not build apex (no --apex-dir)
    --- Builds all completed
    ```
    ```
Valid wheel:
torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl
(222812153 bytes)
    ```

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude <noreply@anthropic.com>
brockhargreaves-amd pushed a commit that referenced this pull request Feb 11, 2026
…3303)

## Motivation

Progress on #3291.

This adds a new `build_portable_linux_pytorch_wheels_ci.yml` workflow
forked from
[`build_portable_linux_pytorch_wheels.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/build_portable_linux_pytorch_wheels.yml).
This new workflow is run as part of our CI pipeline and will help catch
when changes to ROCm break PyTorch source builds. Future work will
expand this to also build other packages, upload the built packages to
S3, and run tests.

This workflow code would have caught the build break reported at
#3042.

## Technical Details

> [!NOTE]
> See #3291 and
https://github.com/ScottTodd/claude-rocm-workspace/blob/main/tasks/active/pytorch-ci.md
for other design considerations.

I'm starting with a narrow scope here to provide _some_ value without
blowing our budget or delaying while we refactor related workflows and
infrastructure code (e.g. moving index page generation server-side,
generating commit manifests at the _start_ of workflows instead of
computing them after the fact and plumbing them through partway through
the jobs)

Specifics:

* Linux only (as a start)
* Non-configurable, always runs (as a start)
* Included for all GPU architectures where `expect_pytorch_failure` is
not set
* Python 3.12 (not full matrix)
* PyTorch release/2.10 branch (not full matrix)
* Only builds 'torch', not 'torchaudio', 'torchvision', 'triton', or
other packages
* Does not upload packages yet
* Does not run tests yet (beyond package sanity checks that `import
torch` works on the build machine)

The build jobs add about 30 minutes of CI time per GPU architecture, and
we are not currently using ccache or sccache
(#3171 will change that)

## Test Plan

* Tested on a known-broken commit
(4497f66)
*
https://github.com/ROCm/TheRock/actions/runs/21768200125/job/62810358116
(failed as expected)
* Test on a known-working commit
(a001047)
*
https://github.com/ROCm/TheRock/actions/runs/21768071862/job/62813030260
(passed as expected)
* CI jobs on this PR itself, e.g.
https://github.com/ROCm/TheRock/actions/runs/21846117572/job/63050058601?pr=3303
    ```

[41](https://github.com/ROCm/TheRock/actions/runs/21846117572/job/63049474316?pr=3303#step:11:78642)
Found built wheel:
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl
++ Copy
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl
-> /home/runner/_work/TheRock/TheRock/output/packages/dist
    +++ Installing built torch:
++ Exec [/tmp]$ /opt/python/cp312-cp312/bin/python -m pip install
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl
+++ Sanity checking installed torch (unavailable is okay on CPU
machines):
++ Capture [/tmp]$ /opt/python/cp312-cp312/bin/python -c 'import torch;
print(torch.cuda.is_available())'
    Sanity check output:
    False
    --- Not build pytorch-audio (no --pytorch-audio-dir)
    --- Not build pytorch-vision (no --pytorch-vision-dir)
    --- Not build apex (no --apex-dir)
    --- Builds all completed
    ```
    ```
Valid wheel:
torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl
(222812153 bytes)
    ```

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude <noreply@anthropic.com>
subodh-dubey-amd added a commit that referenced this pull request Feb 18, 2026
## Motivation

Preparatory refactor for sccache integration ([PR
#3171](#3171 (comment))).
Addresses [reviewer
feedback](#3171 (comment))
on `build_prod_wheels.py` being hard to review due to a single large
`do_build()` function.

## Technical Details

- Extract core build steps (env setup, Triton, PyTorch, Audio, Vision,
Apex, ccache stats) from `do_build()` into new `_do_build_wheels_core()`
helper.
- `do_build()` now handles only setup/orchestration and delegates to the
helper.
- Replace two redundant `get_rocm_path("root")` calls with the
`rocm_dir` parameter.
- **Pure refactor** — no new args, no sccache logic, no behavioral
changes.

## Test Result
No functional changes — refactored code follows the same execution path
as before.
- https://github.com/ROCm/TheRock/actions/runs/21945223080
After dedicated `_setup_common_build_env()` function:
- https://github.com/ROCm/TheRock/actions/runs/22062404175

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
jammm pushed a commit that referenced this pull request Feb 19, 2026
## Motivation

Preparatory refactor for sccache integration ([PR
#3171](#3171 (comment))).
Addresses [reviewer
feedback](#3171 (comment))
on `build_prod_wheels.py` being hard to review due to a single large
`do_build()` function.

## Technical Details

- Extract core build steps (env setup, Triton, PyTorch, Audio, Vision,
Apex, ccache stats) from `do_build()` into new `_do_build_wheels_core()`
helper.
- `do_build()` now handles only setup/orchestration and delegates to the
helper.
- Replace two redundant `get_rocm_path("root")` calls with the
`rocm_dir` parameter.
- **Pure refactor** — no new args, no sccache logic, no behavioral
changes.

## Test Result
No functional changes — refactored code follows the same execution path
as before.
- https://github.com/ROCm/TheRock/actions/runs/21945223080
After dedicated `_setup_common_build_env()` function:
- https://github.com/ROCm/TheRock/actions/runs/22062404175

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
subodh-dubey-amd added a commit that referenced this pull request Feb 20, 2026
## Motivation

Add sccache support to PyTorch wheel builds for S3-backed distributed
caching. Script placed in `build_tools/` per [reviewer
feedback](#3171 (comment)),
modeled after `build_tools/setup_ccache.py`.

Part of sccache PR sequence:
[#3369](#3369) →
[#3389](#3389) → **this** → workflow
wiring.

## Technical Details

- **New: `build_tools/setup_sccache_rocm.py`** — generic sccache ROCm
helper (CLI + importable):
  - `find_sccache()` — locate binary; hard fail if missing
- `setup_rocm_sccache()` — wrap clang/clang++ with sccache stubs (Linux
only)
  - `restore_rocm_compilers()` — undo wrapping
  
- **Modified: `external-builds/pytorch/build_prod_wheels.py`**:
  - `--use-ccache` / `--use-sccache` mutually exclusive args
- Both hard-fail with `RuntimeError` if the requested cache tool is not
found ([per
review](#3171 (comment)))
— no silent fallback
- Added explicit ccache availability check (previously would fail with
an unclear subprocess error)
- sccache: wrap compilers → set CMAKE launchers → `try`/`finally` around
build for guaranteed compiler restore + stats
- Moved ccache stats reporting into `finally` block for consistent
reporting on both success and failure

## Test Result

No workflow changes — sccache wired but not yet invoked by CI (next PR
adds `cache_type` input + AWS config).

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
@subodh-dubey-amd
Copy link
Copy Markdown
Contributor Author

Closing this as this is handled in multiple small PR's
PR sequence: #3369#3306#3389#3482#3352

@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Feb 20, 2026
@subodh-dubey-amd subodh-dubey-amd deleted the users/subodh-dubey-amd/ccache-pytorch branch February 20, 2026 12:02
subodh-dubey-amd added a commit that referenced this pull request Mar 5, 2026
## Summary

Adds `sccache` with S3 remote storage to all four PyTorch wheel build
workflows, significantly reducing build times through distributed
compiler caching.

**PR sequence:** #3369#3306#3389#3482 → **this** → #3189
([based on Reviewer's
Feedback](#3171 (comment)))

## How It Works

| | Linux | Windows |
|---|---|---|
| **Host C/C++** | CMake compiler launchers | CMake compiler launchers |
| **HIP device code** | Wraps ROCm `clang`/`clang++` with sccache | Not
supported |
| **Cleanup** | Restores original compilers via try/finally | N/A |

Cache is stored in the `therock-<workflow>-pytorch-sccache` S3 bucket,
keyed by `<os>/<arch>/` prefix.

## S3 Cache Configuration

Each workflow uses a dedicated S3 bucket and IAM role, keyed by
`<os>/<arch>/` prefix:

| Workflow | S3 Bucket | IAM Role |
|----------|-----------|----------|
| Linux CI | `therock-ci-pytorch-sccache` | `therock-ci` |
| Windows CI | `therock-ci-pytorch-sccache` | `therock-ci` |
| Linux Release | `therock-{release_type}-pytorch-sccache` |
`therock-{release_type}` |
| Windows Release | `therock-{release_type}-pytorch-sccache` |
`therock-{release_type}` |

Where `release_type` is one of: `dev`, `nightly`, `prerelease`.

##  Impact

| Platform | Cold → Warm | Improvement |
|----------|------------|-------------|
| Linux | ~70m → ~37m | **~49%** |
| Windows | ~42m → ~26m | **~38%** |

Windows is lower — sccache cannot wrap HIP device compilation on
Windows, only host C/C++ via CMAKE launchers.

## Tests

### Linux:
- [Linux (Cache
Population)](https://github.com/ROCm/TheRock/actions/runs/22226347964/job/64293924748)
- 70 mins
- [Linux (Cache
Hit)](https://github.com/ROCm/TheRock/actions/runs/22231743387/job/64312966557)
- 37 mins

### Windows:
- [Windows (Cache
Population)](https://github.com/ROCm/TheRock/actions/runs/22219252671/job/64280583887)
- 42 mins
- [Windows (Cache
Hit)](https://github.com/ROCm/TheRock/actions/runs/22223608689/job/64284721704)
- 26 mins

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.'



> Forks: S3 caching is only active for ROCm-owned runs. Fork users can
set cache_type to ccache or none, or leave the default — sccache will
work locally without S3 access.

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants