Add sccache support for PyTorch builds by subodh-dubey-amd · Pull Request #3171 · ROCm/TheRock

subodh-dubey-amd · 2026-01-30T12:39:17Z

Summary

Adds sccache with AWS S3 remote storage for PyTorch wheel builds, significantly reducing build times through distributed compiler caching.

Key Features

S3-backed remote cache: Shared cache across CI runs using therock-pytorch-sccache bucket
Platform-specific cache keys: Organized by linux/<arch>/ and windows/<arch>/ prefixes
ROCm compiler wrapping (Linux): Wraps clang/clang++ in ROCm SDK with sccache for HIP compilation caching
CMake launcher integration: Uses CMAKE_C_COMPILER_LAUNCHER and CMAKE_CXX_COMPILER_LAUNCHER for host code caching
Automatic cleanup: Restores original compilers after build via try/finally block
Robust error handling: Safe wrapper creation with atomic operations and rollback on failure

How It Works

Linux

Downloads and installs sccache binary
Wraps ROCm's clang/clang++ with sccache wrapper scripts
Sets CMake compiler launchers for host code
Caches compilation artifacts to S3
Restores original compilers on completion

Windows

Downloads and installs sccache.exe
Sets CMake compiler launchers for C/C++ host code caching
Caches compilation artifacts to S3

Configuration

Environment variables (set in workflow):

SCCACHE_BUCKET: S3 bucket name
SCCACHE_REGION: AWS region
SCCACHE_S3_KEY_PREFIX: Cache key prefix (os/arch)
SCCACHE_S3_SERVER_SIDE_ENCRYPTION: Enabled
SCCACHE_LOG: Set to warn for error/warning visibility

Files Changed

.github/workflows/build_portable_linux_pytorch_wheels.yml - Linux workflow with sccache config
.github/workflows/build_windows_pytorch_wheels.yml - Windows workflow with sccache config
external-builds/pytorch/build_prod_wheels.py - Build script with sccache integration
external-builds/pytorch/setup_sccache_rocm.py - New module for sccache setup and compiler wrapping

Testing

Linux release builds (gfx110X)
Linux nightly builds
Windows release builds (gfx110X)
Windows nightly - blocked by ROCm SDK 7.12.0 issue (unrelated to this PR)

Known Limitations

Windows HIP device code: Not cached (sccache doesn't support HIP compiler launcher on Windows)
Windows nightly: Failing due to ROCm SDK 7.12.0 bug with HIP compiler detection (CMake passes MSVC linker flags to GNU-like compiler) - this is a pre-existing infrastructure issue, not caused by this PR as in 30/01/2026
https://github.com/ROCm/TheRock/actions/runs/21508193319

Run 1 ( Cache Population )

Run 2 ( Cache Hit )

Linux PyTorch Build Times

Release	Run 1 (Cache Population)	Run 2 (Cache Hit)	Time Saved	Improvement
release/2.7	40m	22m	18m	45%
release/2.8	50-51m	26-27m	24m	48%
release/2.9	48-49m	23-24m	25m	52%
release/2.10	52-53m	27-29m	24m	47%
nightly	53-54m	28m	25m	47%

Linux Average: ~48% improvement

Windows PyTorch Build Times

Release	Run 1 (Cache Population)	Run 2 (Cache Hit)	Time Saved	Improvement
release/2.9	71-79m	58-59m	15m	19%
release/2.10	64-72m	53-59m	11m	16%
nightly	❌ Failed	❌ Failed	-	-

Windows Average: ~17% improvement (release builds only)

Summary: Build Time Improvements

Platform	Cache Population	Cache Hit	Improvement
Linux	~50m avg	~25m avg	~48%
Windows	~72m avg	~57m avg	~17%

Times vary based on cache hit rate and code changes

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…ncatenation instead of Path() for relative links

… symlink handling and path resolution.

… cleanup

…ce CMake launcher setup for ROCm builds

lld doesn't work with mixed GCC/Clang builds - Triton uses GCC which doesn't support -fuse-ld=/path/to/lld syntax. Only Clang supports full path linker specification.

…pport in sccache;

… error handling and improve binary management.

…atibility

…ws handling for ROCm builds; remove HIP compiler launcher due to compatibility issues.

ScottTodd

Cool, I think this is heading in a good direction.

ScottTodd · 2026-01-30T17:41:53Z

+      - name: Configure AWS Credentials for sccache
+        if: ${{ github.repository_owner == 'ROCm' }}
+        uses: aws-actions/configure-aws-credentials@61815dcd50bd041e203e49132bacad1fd04d2708 # v5.1.1
+        with:
+          aws-region: us-east-2
+          role-to-assume: arn:aws:iam::692859939525:role/therock-${{ inputs.release_type }}


These are the same roles as we use for uploading release files (python packages, artifacts). Do we want a separate role for using sccache?

Is the therock-pytorch-sccache bucket public read but private write, or private for both?

cc @marbre

Bucket is private and blocking all the public access but only accessible throught these role only role-to-assume: arn:aws:iam::692859939525:role/therock-${{ inputs.release_type }}

Using the existing release role because:
The sccache operations (S3 GetObject/PutObject) are a subset of what the arn:aws:iam::692859939525:role/therock-${{ inputs.release_type } roles already have

Need @marbre and @amd-shiraz to weigh in here (and perhaps @amd-justchen too, given his prior work on the ccache server we use for building ROCm).

We need a clear policy written down for how cache buckets and access is handled. On other projects we've made these decisions:

CI cache buckets are world readable so developers can benefit from the CI cache

workflows running on schedule or push can read and write to the cache

workflows running on pull_request can only read from the cache

(what about workflow_dispatch?)

I'd like to apply the same policies for PyTorch and ROCm builds, so we aren't dealing with an explosion of different settings when we also enable caching for JAX and other projects.

Note that I also have #3303 open which creates a new workflow for building pytorch on CI. That will be the main place that a build cache will be needed. Having a cache for dev or nightly release builds is more of a nice-to-have given the reduced job frequency and lower bar for build cache integrity.

Also another reminder: keep PRs small and focused. This takes weeks to review because the change does multiple things at once, and each related piece has open design questions.

PR sequence:

Add sccache to dockerfiles

Set workflows to use new dockerfiles

Add sccache support to build scripts

Have workflows use the new sccache support

Each of those would have significantly shorter review turnaround time.

ScottTodd · 2026-01-30T17:43:20Z

+      - name: Install sccache
+        run: |
+          pip install sccache
+          sccache --version


Let's get sccache into our build dockerfile, similar to ccache:

TheRock/dockerfiles/build_manylinux_x86_64.Dockerfile

Lines 27 to 30 in 9e9c726

######## CCache ########

WORKDIR /install-ccache

COPY install_ccache.sh ./

RUN ./install_ccache.sh "4.11.2" && rm -rf /install-ccache

https://github.com/ROCm/TheRock/blob/main/dockerfiles/install_ccache.sh

https://github.com/ROCm/TheRock/blob/main/dockerfiles/README.md#updating-images-used-by-github-actions-workflows

I don't trust a pip install in this workflow prior to the two steps below that select a python version and put that python version on PATH.

I agree we should get this into our base image

Done. Added sccache installation to the Docker image via dockerfiles/install_sccache.sh (similar pattern to ccache installation).

ScottTodd · 2026-01-30T17:44:22Z

      S3_BUCKET_PY: "therock-${{ inputs.release_type }}-python"
      optional_build_prod_arguments: ""
+      # sccache configuration for ROCm compiler caching with S3 backend
+      SCCACHE_BUCKET: therock-pytorch-sccache


We may want separate cache buckets (or namespaces) for dev, nightly, and stable releases.

Updated to use separate buckets per environment:

therock-dev-pytorch-sccache

therock-nightly-pytorch-sccache

therock-prerelease-pytorch-sccache

Each environment's IAM role (therock-dev, therock-nightly, therock-prerelease) has access only to its corresponding bucket.

Check with @marbre for these bucket configurations and role settings. This TheRock repository will retain access to the "dev" role but nightly and prerelease are moving to https://github.com/ROCm/rockrel.

@marbre
Each environment's IAM role (therock-dev, therock-nightly, therock-prerelease) has access only to its corresponding bucket.

therock-dev-pytorch-sccache

therock-nightly-pytorch-sccache

therock-prerelease-pytorch-sccache

Attached the dev role policy screenshot. Do we need any changes here ?

ScottTodd · 2026-01-30T17:46:45Z

+        except Exception as e:
+            print(f"ERROR: sccache setup failed: {e}")
+            print("Falling back to ccache for host code compilation...")
+            args.use_sccache = False
+            args.use_ccache = True
+            env["CMAKE_C_COMPILER_LAUNCHER"] = "ccache"
+            env["CMAKE_CXX_COMPILER_LAUNCHER"] = "ccache"
+            try:
+                run_command(["ccache", "--zero-stats"], cwd=tempfile.gettempdir())
+            except Exception as ccache_error:
+                print(f"WARNING: ccache fallback also failed: {ccache_error}")
+                print("Continuing without compiler caching...")
+                args.use_ccache = False


The diffs in this file are difficult to review due to the changes to indentation to accomodate more exception handling. It might help to first pull some of these sections into functions in one PR/commit and then have another PR/commit wrap with sccache setup.

ScottTodd · 2026-01-30T17:47:54Z

+        except Exception as e:
+            print(f"ERROR: sccache setup failed: {e}")
+            print("Falling back to ccache for host code compilation...")
+            args.use_sccache = False
+            args.use_ccache = True


I'd rather we respect the user's choice here and hard fail instead of falling back to something the user didn't request.

If --use-sccache is set but sccache couldn't be set up for some reason, fail.

Same for --use-ccache

Can we proceed with out any cache settings we dont find the sccache instead of falling back to ccache

Always prefer visible errors - fail fast: https://github.com/ROCm/TheRock/blob/main/docs/development/style_guides/python_style_guide.md#fail-fast-behavior. We don't want to discover that we've been running for months without a functional cache due to an environment configuration issue that trips the fallback path.

Done. Changed to hard fail - now raises RuntimeError.

To address Akash's comment, if we want a way to build without cache, Introduced cache_type input for both Linux and Windows workflows to specify the compiler cache type (sccache, ccache, or none).

ScottTodd · 2026-01-30T17:50:23Z

    build_p.add_argument(
        "--use-ccache",
        action=argparse.BooleanOptionalAction,
-        help="Use ccache as the compiler launcher",
+        help="Use ccache as the compiler launcher (for host code only)",
+    )
+    build_p.add_argument(
+        "--use-sccache",
+        action=argparse.BooleanOptionalAction,
+        help="Use sccache with ROCm compiler wrapping (comprehensive caching for HIP code)",
    )


Let's make --use-ccache and --use-sccache mututally exclusive.

https://docs.python.org/3/library/argparse.html#mutual-exclusion

Done. Updated to use argparse.add_mutually_exclusive_group()

ScottTodd · 2026-01-30T17:52:08Z

+def main():
+    parser = argparse.ArgumentParser(
+        description="Setup sccache to wrap ROCm compilers for PyTorch builds"
+    )


I don't see any references to "torch" in this file outside of comments. Can we either

Use scripts provided by pytorch itself

Move this to build_tools/ and share with multiple project builds. We can model the file after https://github.com/ROCm/TheRock/blob/main/build_tools/setup_ccache.py

ScottTodd · 2026-01-30T17:56:46Z

+      - name: Report sccache stats
+        if: ${{ !cancelled() }}
+        run: |
+          echo "sccache Stats:"
+          echo "--------------"
+          sccache --show-stats || true


This is okay for now, but relating to my other comment about making the sccache setup script more generic (and not specific to pytorch), we have a common pattern for "setup cache" and "report cache stats".

See how build_tools/health_status.py is run here:

TheRock/.github/workflows/build_portable_linux_artifacts.yml

Lines 95 to 106 in 9e9c726

# TODO: We shouldn't be using a cache on actual release branches, but it

# really helps for iteration time.

- name: Setup ccache

run: |

./build_tools/setup_ccache.py \

--config-preset "github-oss-presubmit" \

--dir "$(dirname $CCACHE_CONFIGPATH)" \

--local-path "$CACHE_DIR/ccache"

- name: Runner health status

run: |

./build_tools/health_status.py

We could add sccache to the env check, like

TheRock/build_tools/hack/env_check/check_tools.py

Lines 328 to 332 in 9e9c726

class CheckCCache(CheckProgram):

def __init__(self, required=False):

super().__init__(required)

self.program = FindCCache()

self.name = "CCache"

TheRock/build_tools/hack/env_check/find_tools.py

Lines 191 to 195 in 9e9c726

class FindCCache(FindProgram):

def __init__(self):

super().__init__()

self.name = "ccache"

self.get_version()

TheRock/build_tools/hack/env_check/device.py

Lines 421 to 450 in 9e9c726

def device_ccache_system(self):

"""

Returns a pair of string lists that contain information about the ccache on

the system. If ccache is not installed, strings stating this are returned.

CCACHE_STAT (= [0]) contains general status about ccache

CCACHE_CONFIG ( = [1]) contains the ccache config

"""

ccache = []

try:

proc = subprocess.run(

["ccache", "-s", "-v"], capture_output=True, text=True, check=True

)

ccache.append([proc.stdout.splitlines()])

except (subprocess.CalledProcessError, FileNotFoundError):

ccache.append(["Ccache not detected!"])

ccache.append([""])

return ccache

try:

proc = subprocess.run(

["ccache", "--show-config"], capture_output=True, text=True, check=True

)

ccache.append([proc.stdout.splitlines()])

except (subprocess.CalledProcessError, FileNotFoundError):

ccache.append([""])

return ccache

(quite a lot of boilerplate that way though...)

Then, on the post-build side of the workflows, we have this code now that could be moved to a similar script:

TheRock/.github/workflows/build_portable_linux_artifacts.yml

Lines 132 to 154 in 9e9c726

- name: Report

if: ${{ !cancelled() }}

shell: bash

run: |

if [ -d "./build" ]; then

echo "Full SDK du:"

echo "------------"

du -h -d 1 build/dist/rocm

echo "Artifact Archives:"

echo "------------------"

ls -lh build/artifacts/*.tar.xz

echo "Artifacts:"

echo "----------"

du -h -d 1 build/artifacts

echo "CCache Stats:"

echo "-------------"

ccache -s -v

tail -v -n +1 .ccache/compiler_check_cache/* > build/logs/ccache_compiler_check_cache.log

else

echo "[ERROR] Build directory ./build does not exist. Skipping report!"

echo " This should only happen if the CI is cancelled before the build step."

exit 1

fi

agreed. Keeping the inline approach for now to limit scope. Created #3189 to track adding sccache to env_check tooling and unifying cache stats reporting as a follow-up.

ScottTodd · 2026-01-30T17:58:34Z

+def install_sccache() -> Path:
+    """Install sccache if not available."""
+    sccache_path = find_sccache()
+    if sccache_path:
+        print(f"Found sccache at: {sccache_path}")
+        return sccache_path
+
+    print("sccache not found, attempting to install...")
+
+    if is_windows:
+        # Try cargo install
+        try:
+            subprocess.check_call(["cargo", "install", "sccache"])
+            sccache_path = Path.home() / ".cargo" / "bin" / "sccache.exe"
+            if sccache_path.exists():
+                return sccache_path
+        except (subprocess.CalledProcessError, FileNotFoundError):
+            pass
+
+        raise RuntimeError(
+            "Could not install sccache. Please install it manually:\n"
+            "  choco install sccache\n"
+            "  or: cargo install sccache"
+        )
+    else:
+        # Try pip install (sccache is available on PyPI)
+        try:
+            subprocess.check_call([sys.executable, "-m", "pip", "install", "sccache"])
+            sccache_path = find_sccache()
+            if sccache_path:
+                return sccache_path
+        except subprocess.CalledProcessError:
+            pass
+
+        # Try cargo install as fallback
+        try:
+            subprocess.check_call(["cargo", "install", "sccache"])
+            sccache_path = Path.home() / ".cargo" / "bin" / "sccache"
+            if sccache_path.exists():
+                return sccache_path
+        except (subprocess.CalledProcessError, FileNotFoundError):
+            pass
+
+        raise RuntimeError(
+            "Could not install sccache. Please install it manually:\n"
+            "  pip install sccache\n"
+            "  or: cargo install sccache"
+        )


I don't think this script should do any installing on its own. Our other scripts don't do that, and we should have

Predictable tool installs in our base build environments

Script that fail if the environment is not configured as expected

Done. Removed the install_sccache() function. The script now:

Uses find_sccache() to locate the binary

Fails with RuntimeError if sccache is not found

sccache is now pre-installed via:

Linux: Docker image (install_sccache.sh)

Windows: choco install sccache in workflow

ScottTodd · 2026-01-30T18:04:22Z

Only a 17% improvement of build time on Windows is interesting 🤔

In my local builds back in August I was able to get from 40-60 minutes down to 6 minutes with ccache.

What i tried:

CMAKE_HIP_COMPILER_LAUNCHER=sccache → "Compiler not supported" error

HIP_CLANG_LAUNCHER=sccache → No improvement

Wrapper scripts (like Linux) → Doesn't work on Windows due to toolchain differences

Do you remember the ccache configuration from August? Specifically:

Any special environment variables or flags?

Local cache or remote storage?

Local cache with just the --use-ccache option to this script, no extra tuning or settings. I didn't run detailed experiments at the time, but I posted as a footnote on pytorch/pytorch#159520 (comment)

By the way, on my machine with ccache, through those build scripts I'm seeing about 40-60 minutes for a cold cache build, 6 minutes on a clean build with 95.80% cache hits, and 1 minute on a rebuild (existing build directory + warm cache).

- Simplify stats output: just use sccache --show-stats - Make --use-ccache and --use-sccache mutually exclusive - Remove parse_sccache_stats and print_sccache_stats function

… specify the compiler cache type (sccache, ccache, or none).

…ype support

…Torch wheels to a user-specific version

… to include release type

…ndows to improve consistency

…Torch wheels to include sccache and Add TODO for SHA pinning after merge

…Torch wheels to a specific SHA and refine TODO for future updates

ScottTodd · 2026-02-10T21:10:05Z

-      image: ghcr.io/rocm/therock_build_manylinux_x86_64@sha256:db2b63f938941dde2abc80b734e64b45b9995a282896d513a0f3525d4591d6cb
+      # TODO(follow-up PR): Update SHA to main image after Dockerfile changes merge
+      image: ghcr.io/rocm/therock_build_manylinux_x86_64@sha256:6e7d49caefd37cdda93487bafde973a683f372d517ca7e5bbb4232ebdcfaca30


Sequence these changes to the dockerfile as their own PRs, following these instructions: https://github.com/ROCm/TheRock/tree/main/dockerfiles#updating-images-used-by-github-actions-workflows

(I only have time to review PRs that are "ready", and this can't be ready by design - it could be marked as draft until the sequence of changes lands)

ScottTodd · 2026-02-10T21:10:34Z

(see my other comment) Move the Dockerfile changes to their own PR and land them first

https://github.com/ROCm/TheRock/tree/main/dockerfiles#updating-images-used-by-github-actions-workflows

…3303) ## Motivation Progress on #3291. This adds a new `build_portable_linux_pytorch_wheels_ci.yml` workflow forked from [`build_portable_linux_pytorch_wheels.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/build_portable_linux_pytorch_wheels.yml). This new workflow is run as part of our CI pipeline and will help catch when changes to ROCm break PyTorch source builds. Future work will expand this to also build other packages, upload the built packages to S3, and run tests. This workflow code would have caught the build break reported at #3042. ## Technical Details > [!NOTE] > See #3291 and https://github.com/ScottTodd/claude-rocm-workspace/blob/main/tasks/active/pytorch-ci.md for other design considerations. I'm starting with a narrow scope here to provide _some_ value without blowing our budget or delaying while we refactor related workflows and infrastructure code (e.g. moving index page generation server-side, generating commit manifests at the _start_ of workflows instead of computing them after the fact and plumbing them through partway through the jobs) Specifics: * Linux only (as a start) * Non-configurable, always runs (as a start) * Included for all GPU architectures where `expect_pytorch_failure` is not set * Python 3.12 (not full matrix) * PyTorch release/2.10 branch (not full matrix) * Only builds 'torch', not 'torchaudio', 'torchvision', 'triton', or other packages * Does not upload packages yet * Does not run tests yet (beyond package sanity checks that `import torch` works on the build machine) The build jobs add about 30 minutes of CI time per GPU architecture, and we are not currently using ccache or sccache (#3171 will change that) ## Test Plan * Tested on a known-broken commit (4497f66) * https://github.com/ROCm/TheRock/actions/runs/21768200125/job/62810358116 (failed as expected) * Test on a known-working commit (a001047) * https://github.com/ROCm/TheRock/actions/runs/21768071862/job/62813030260 (passed as expected) * CI jobs on this PR itself, e.g. https://github.com/ROCm/TheRock/actions/runs/21846117572/job/63050058601?pr=3303 ``` [41](https://github.com/ROCm/TheRock/actions/runs/21846117572/job/63049474316?pr=3303#step:11:78642) Found built wheel: /__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl ++ Copy /__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl -> /home/runner/_work/TheRock/TheRock/output/packages/dist +++ Installing built torch: ++ Exec [/tmp]$ /opt/python/cp312-cp312/bin/python -m pip install /__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl +++ Sanity checking installed torch (unavailable is okay on CPU machines): ++ Capture [/tmp]$ /opt/python/cp312-cp312/bin/python -c 'import torch; print(torch.cuda.is_available())' Sanity check output: False --- Not build pytorch-audio (no --pytorch-audio-dir) --- Not build pytorch-vision (no --pytorch-vision-dir) --- Not build apex (no --apex-dir) --- Builds all completed ``` ``` Valid wheel: torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl (222812153 bytes) ``` ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude <noreply@anthropic.com>

## Motivation Preparatory refactor for sccache integration ([PR #3171](#3171 (comment))). Addresses [reviewer feedback](#3171 (comment)) on `build_prod_wheels.py` being hard to review due to a single large `do_build()` function. ## Technical Details - Extract core build steps (env setup, Triton, PyTorch, Audio, Vision, Apex, ccache stats) from `do_build()` into new `_do_build_wheels_core()` helper. - `do_build()` now handles only setup/orchestration and delegates to the helper. - Replace two redundant `get_rocm_path("root")` calls with the `rocm_dir` parameter. - **Pure refactor** — no new args, no sccache logic, no behavioral changes. ## Test Result No functional changes — refactored code follows the same execution path as before. - https://github.com/ROCm/TheRock/actions/runs/21945223080 After dedicated `_setup_common_build_env()` function: - https://github.com/ROCm/TheRock/actions/runs/22062404175 ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

## Motivation Add sccache support to PyTorch wheel builds for S3-backed distributed caching. Script placed in `build_tools/` per [reviewer feedback](#3171 (comment)), modeled after `build_tools/setup_ccache.py`. Part of sccache PR sequence: [#3369](#3369) → [#3389](#3389) → **this** → workflow wiring. ## Technical Details - **New: `build_tools/setup_sccache_rocm.py`** — generic sccache ROCm helper (CLI + importable): - `find_sccache()` — locate binary; hard fail if missing - `setup_rocm_sccache()` — wrap clang/clang++ with sccache stubs (Linux only) - `restore_rocm_compilers()` — undo wrapping - **Modified: `external-builds/pytorch/build_prod_wheels.py`**: - `--use-ccache` / `--use-sccache` mutually exclusive args - Both hard-fail with `RuntimeError` if the requested cache tool is not found ([per review](#3171 (comment))) — no silent fallback - Added explicit ccache availability check (previously would fail with an unclear subprocess error) - sccache: wrap compilers → set CMAKE launchers → `try`/`finally` around build for guaranteed compiler restore + stats - Moved ccache stats reporting into `finally` block for consistent reporting on both success and failure ## Test Result No workflow changes — sccache wired but not yet invoked by CI (next PR adds `cache_type` input + AWS config). ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

subodh-dubey-amd · 2026-02-20T12:02:04Z

Closing this as this is handled in multiple small PR's
PR sequence: #3369 → #3306 → #3389 → #3482 → #3352

## Summary Adds `sccache` with S3 remote storage to all four PyTorch wheel build workflows, significantly reducing build times through distributed compiler caching. **PR sequence:** #3369 → #3306 → #3389 → #3482 → **this** → #3189 ([based on Reviewer's Feedback](#3171 (comment))) ## How It Works | | Linux | Windows | |---|---|---| | **Host C/C++** | CMake compiler launchers | CMake compiler launchers | | **HIP device code** | Wraps ROCm `clang`/`clang++` with sccache | Not supported | | **Cleanup** | Restores original compilers via try/finally | N/A | Cache is stored in the `therock-<workflow>-pytorch-sccache` S3 bucket, keyed by `<os>/<arch>/` prefix. ## S3 Cache Configuration Each workflow uses a dedicated S3 bucket and IAM role, keyed by `<os>/<arch>/` prefix: | Workflow | S3 Bucket | IAM Role | |----------|-----------|----------| | Linux CI | `therock-ci-pytorch-sccache` | `therock-ci` | | Windows CI | `therock-ci-pytorch-sccache` | `therock-ci` | | Linux Release | `therock-{release_type}-pytorch-sccache` | `therock-{release_type}` | | Windows Release | `therock-{release_type}-pytorch-sccache` | `therock-{release_type}` | Where `release_type` is one of: `dev`, `nightly`, `prerelease`. ## Impact | Platform | Cold → Warm | Improvement | |----------|------------|-------------| | Linux | ~70m → ~37m | **~49%** | | Windows | ~42m → ~26m | **~38%** | Windows is lower — sccache cannot wrap HIP device compilation on Windows, only host C/C++ via CMAKE launchers. ## Tests ### Linux: - [Linux (Cache Population)](https://github.com/ROCm/TheRock/actions/runs/22226347964/job/64293924748) - 70 mins - [Linux (Cache Hit)](https://github.com/ROCm/TheRock/actions/runs/22231743387/job/64312966557) - 37 mins ### Windows: - [Windows (Cache Population)](https://github.com/ROCm/TheRock/actions/runs/22219252671/job/64280583887) - 42 mins - [Windows (Cache Hit)](https://github.com/ROCm/TheRock/actions/runs/22223608689/job/64284721704) - 26 mins ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.' > Forks: S3 caching is only active for ROCm-owned runs. Fork users can set cache_type to ccache or none, or leave the default — sccache will work locally without S3 access. ---------

subodh-dubey-amd added 14 commits January 30, 2026 07:00

Add sccache support for ROCm compiler caching in pytorch build workflows

d17ba87

Fix symlink target creation in setup_sccache_rocm.py to use string co…

67d763c

…ncatenation instead of Path() for relative links

Refactor sccache wrapper creation in setup_sccache_rocm.py to improve…

7848914

… symlink handling and path resolution.

Refactor build_prod_wheels.py to wrap code in try/finally for sccache…

0debddf

… cleanup

Remove unused CACHE_DIR environment variable from workflow

68462e8

Refactor whitespace and formatting

f3354a3

Update sccache S3 key prefixes for Linux and Windows workflows; enhan…

b0dbca2

…ce CMake launcher setup for ROCm builds

Add --use-lld flag for faster linking (Linux only)

9872074

Remove lld linker support

d55656a

lld doesn't work with mixed GCC/Clang builds - Triton uses GCC which doesn't support -fuse-ld=/path/to/lld syntax. Only Clang supports full path linker specification.

Update CMake launcher setup for ROCm builds to remove HIP compiler su…

6bced63

…pport in sccache;

Refactor sccache wrapper creation in setup_sccache_rocm.py to enhance…

5721f0b

… error handling and improve binary management.

Enhance CMake launcher setup in build_prod_wheels.py for Windows comp…

01d0bbe

…atibility

Clarify CMake launcher setup in build_prod_wheels.py and update Windo…

72cbdba

…ws handling for ROCm builds; remove HIP compiler launcher due to compatibility issues.

Add sccache logging to pytorch workflows

07bd13c

github-project-automation Bot added this to TheRock Triage Jan 30, 2026

github-project-automation Bot moved this to TODO in TheRock Triage Jan 30, 2026

Refactor whitespace and formatting using precommit

99be1df

subodh-dubey-amd marked this pull request as ready for review January 30, 2026 12:55

ScottTodd reviewed Jan 30, 2026

View reviewed changes

ScottTodd requested a review from amd-shiraz January 30, 2026 18:05

subodh-dubey-amd added 7 commits January 31, 2026 07:09

Update sccache installation in Dockerfile

0b69188

- Remove fallback logic: fail fast if sccache setup fails

8022152

- Simplify stats output: just use sccache --show-stats - Make --use-ccache and --use-sccache mutually exclusive - Remove parse_sccache_stats and print_sccache_stats function

Refactor sccache stats output handling in build_prod_wheels.py

85857d4

Introduced cache_type input for both Linux and Windows workflows to…

90b2b9d

… specify the compiler cache type (sccache, ccache, or none).

Temp: Add temporary sccache installation step in workflow for cache t…

45b0403

…ype support

Remove temporary sccache installation step from workflow

fd3c72b

Update Docker image reference in build workflow for portable Linux Py…

9c95c75

…Torch wheels to a user-specific version

subodh-dubey-amd force-pushed the users/subodh-dubey-amd/ccache-pytorch branch from 8aa2314 to 9c95c75 Compare February 1, 2026 16:01

subodh-dubey-amd requested a review from ScottTodd February 1, 2026 17:24

Update sccache bucket naming in build workflows for Linux and Windows…

0526240

… to include release type

subodh-dubey-amd added 3 commits February 2, 2026 06:23

Standardize sccache bucket naming in build workflows for Linux and Wi…

5a087b9

…ndows to improve consistency

Update Docker image reference in build workflow for portable Linux Py…

7f3a181

…Torch wheels to include sccache and Add TODO for SHA pinning after merge

Update Docker image reference in build workflow for portable Linux Py…

344618a

…Torch wheels to a specific SHA and refine TODO for future updates

ScottTodd mentioned this pull request Feb 7, 2026

[ci][torch] Build Linux PyTorch wheels as part of ci.yml workflows #3303

Merged

1 task

ScottTodd reviewed Feb 10, 2026

View reviewed changes

subodh-dubey-amd requested review from marbre and removed request for marbre February 11, 2026 07:16

subodh-dubey-amd marked this pull request as draft February 11, 2026 13:50

This was referenced Feb 12, 2026

Use manylinux image with sccache for Linux PyTorch wheel builds #3387

Closed

Refactor do_build in build_prod_wheels.py for reviewability #3389

Merged

subodh-dubey-amd mentioned this pull request Feb 18, 2026

Add sccache support to build scripts with ROCm compiler wrapping #3482

Merged

1 task

subodh-dubey-amd mentioned this pull request Feb 20, 2026

Wire sccache into PyTorch wheel build workflows #3532

Merged

1 task

subodh-dubey-amd closed this Feb 20, 2026

github-project-automation Bot moved this from TODO to Done in TheRock Triage Feb 20, 2026

subodh-dubey-amd deleted the users/subodh-dubey-amd/ccache-pytorch branch February 20, 2026 12:02

	######## CCache ########
	WORKDIR /install-ccache
	COPY install_ccache.sh ./
	RUN ./install_ccache.sh "4.11.2" && rm -rf /install-ccache

	# TODO: We shouldn't be using a cache on actual release branches, but it
	# really helps for iteration time.
	- name: Setup ccache
	run: \|
	./build_tools/setup_ccache.py \
	--config-preset "github-oss-presubmit" \
	--dir "$(dirname $CCACHE_CONFIGPATH)" \
	--local-path "$CACHE_DIR/ccache"

	- name: Runner health status
	run: \|
	./build_tools/health_status.py

	class CheckCCache(CheckProgram):
	def __init__(self, required=False):
	super().__init__(required)
	self.program = FindCCache()
	self.name = "CCache"

	class FindCCache(FindProgram):
	def __init__(self):
	super().__init__()
	self.name = "ccache"
	self.get_version()

	def device_ccache_system(self):
	"""
	Returns a pair of string lists that contain information about the ccache on
	the system. If ccache is not installed, strings stating this are returned.

	CCACHE_STAT (= [0]) contains general status about ccache
	CCACHE_CONFIG ( = [1]) contains the ccache config
	"""

	ccache = []
	try:
	proc = subprocess.run(
	["ccache", "-s", "-v"], capture_output=True, text=True, check=True
	)

	ccache.append([proc.stdout.splitlines()])
	except (subprocess.CalledProcessError, FileNotFoundError):
	ccache.append(["Ccache not detected!"])
	ccache.append([""])
	return ccache

	try:
	proc = subprocess.run(
	["ccache", "--show-config"], capture_output=True, text=True, check=True
	)
	ccache.append([proc.stdout.splitlines()])
	except (subprocess.CalledProcessError, FileNotFoundError):
	ccache.append([""])

	return ccache

	- name: Report
	if: ${{ !cancelled() }}
	shell: bash
	run: \|
	if [ -d "./build" ]; then
	echo "Full SDK du:"
	echo "------------"
	du -h -d 1 build/dist/rocm
	echo "Artifact Archives:"
	echo "------------------"
	ls -lh build/artifacts/*.tar.xz
	echo "Artifacts:"
	echo "----------"
	du -h -d 1 build/artifacts
	echo "CCache Stats:"
	echo "-------------"
	ccache -s -v
	tail -v -n +1 .ccache/compiler_check_cache/* > build/logs/ccache_compiler_check_cache.log
	else
	echo "[ERROR] Build directory ./build does not exist. Skipping report!"
	echo " This should only happen if the CI is cancelled before the build step."
	exit 1
	fi

Conversation

subodh-dubey-amd commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

How It Works

Linux

Windows

Configuration

Files Changed

Testing

Known Limitations

Run 1 ( Cache Population )

Run 2 ( Cache Hit )

Linux PyTorch Build Times

Windows PyTorch Build Times

Summary: Build Time Improvements

Submission Checklist

Uh oh!

ScottTodd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

subodh-dubey-amd Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

subodh-dubey-amd Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

subodh-dubey-amd Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

subodh-dubey-amd commented Jan 30, 2026 •

edited

Loading

subodh-dubey-amd Jan 31, 2026 •

edited

Loading

subodh-dubey-amd Jan 31, 2026 •

edited

Loading

subodh-dubey-amd Jan 31, 2026 •

edited

Loading