Skip to content

[ROCm] torch 2.11 + rocclr profiler hotfix for ROCm 7.2 + aiter v0.1.12.post2#40035

Closed
Rohan138 wants to merge 12 commits into
vllm-project:mainfrom
ROCm:rocclr_profiler_hotfix
Closed

[ROCm] torch 2.11 + rocclr profiler hotfix for ROCm 7.2 + aiter v0.1.12.post2#40035
Rohan138 wants to merge 12 commits into
vllm-project:mainfrom
ROCm:rocclr_profiler_hotfix

Conversation

@Rohan138

@Rohan138 Rohan138 commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@Rohan138

Copy link
Copy Markdown
Contributor Author

cc @gshtras @tjtanaa

@mergify mergify Bot added ci/build rocm Related to AMD ROCm labels Apr 16, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 16, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the ROCm base image to version 7.2.2 and implements a torch profiler hotfix by building the Compute Language Runtime from source. Feedback suggests using an immutable commit SHA instead of a personal branch for the hotfix source to improve stability and security. Furthermore, the build configuration should explicitly set the installation prefix to /opt/rocm/ to ensure the patched libraries are correctly utilized and not shadowed by original system files.

Comment thread docker/Dockerfile.rocm_base Outdated

# torch profiler hotfix for 7.2.2: cherry-pick ROCm/rocm-systems#5062 and rebuild CLR
# will be removed once we move to ROCm 7.13
ARG CLR_HOTFIX_BRANCH="users/saleelk/7.2cp2"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a personal user branch (users/saleelk/7.2cp2) for a core dependency hotfix is a significant risk to build stability and security, as the branch can be deleted or modified without notice. It is highly recommended to use a specific, immutable commit SHA instead to ensure reproducibility.

Comment thread docker/Dockerfile.rocm_base
Rohan138 and others added 8 commits April 16, 2026 10:51
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@Rohan138 Rohan138 changed the title Rocclr profiler hotfix for ROCm 7.2 [ROCm] torch 2.11 + rocclr profiler hotfix for ROCm 7.2 Apr 20, 2026
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@Rohan138 Rohan138 force-pushed the rocclr_profiler_hotfix branch from a9d2b4a to d69efb1 Compare April 20, 2026 17:35
@Rohan138 Rohan138 changed the title [ROCm] torch 2.11 + rocclr profiler hotfix for ROCm 7.2 [ROCm] torch 2.11 + rocclr profiler hotfix for ROCm 7.2 + aiter v0.1.12.post1 Apr 20, 2026
@gshtras

gshtras commented Apr 22, 2026

Copy link
Copy Markdown
Collaborator

CI run with the new base image:
https://buildkite.com/vllm/amd-ci/builds/7898/summary

@gshtras gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 22, 2026
@Rohan138 Rohan138 changed the title [ROCm] torch 2.11 + rocclr profiler hotfix for ROCm 7.2 + aiter v0.1.12.post1 [ROCm] torch 2.11 + rocclr profiler hotfix for ROCm 7.2 + aiter v0.1.12.post2 Apr 29, 2026
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@Rohan138 Rohan138 force-pushed the rocclr_profiler_hotfix branch from 437fec3 to ec4c422 Compare April 29, 2026 00:18
@tjtanaa

tjtanaa commented Apr 29, 2026

Copy link
Copy Markdown
Member

Can we try to split this PR and try to ship whichever that is working first?

@Rohan138 Rohan138 closed this May 6, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 6, 2026
@Rohan138

Rohan138 commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

torch 2.11 still being fixed, ROCm 7.2+CLR fix+AITER bump merged in #41386

@Rohan138 Rohan138 deleted the rocclr_profiler_hotfix branch May 6, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants