Update Dockerfile.rocm_ci by clintg6 · Pull Request #6 · ROCm/flashinfer

clintg6 · 2025-10-01T23:35:47Z

Summary

This PR updates the Dockerfile to ensure the Micromamba environment is automatically activated when the Docker container is started, without requiring the user to manually run activation commands.

Changes

Appended Micromamba shell hook and environment activation to ~/.bashrc.
- This ensures the environment is active in all interactive shell sessions (e.g., docker run -it).
Keeps existing build process and environment creation logic unchanged.

Why?

Previously, users needed to manually activate the Micromamba environment after starting the container:

This update streamlines the user experience by making the environment ready-to-use immediately after container startup.

Test

You can test this by building and running the container interactively:

docker build -f docker/Dockerfile.rocm_ci --target flashinfer_base -t flashinfer-rocm . 2>&1 | tee docker_build.log

and running

docker run -it --network=host --group-add=video \
           --privileged --ipc=host --cap-add=SYS_PTRACE \
           --security-opt seccomp=unconfined --device /dev/kfd \
           --device /dev/dri flashinfer-rocm

Then inside the container, run:

pip show flashinfer

Copilot

Pull Request Overview

This PR aims to auto-activate the Micromamba environment in interactive shells by appending initialization and activation commands to the container’s ~/.bashrc.

Append Micromamba shell hook and environment activation to ~/.bashrc
Keep existing build and environment creation logic unchanged

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

docker/Dockerfile.rocm_ci

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Clint <clintg6@users.noreply.github.com>

diptorupd · 2025-10-03T22:23:15Z

@clintg6 I am fine with these changes. Let @rtmadduri have a look once and I will merge it.

This PR fixes some of the unit test failures that occur in Single Decode. It also disables clang formatting of headers. The clang format of headers causes compilation issues. The compiler is unable to find `HIP WARP SYNC INTRINSICS` causing failures. Disabling clang format fixes these issues ``` Start 1: MathTest 1/6 Test #1: MathTest ......................... Passed 3.31 sec Start 2: PosEncTest 2/6 Test #2: PosEncTest ....................... Passed 3.36 sec Start 3: CascadeTest 3/6 Test #3: CascadeTest ...................... Passed 3.35 sec Start 4: PageTest 4/6 Test #4: PageTest ......................... Passed 114.08 sec Start 5: SingleDecodeTest 5/6 Test #5: SingleDecodeTest ................. Passed 35.22 sec Start 6: BatchDecodeTest 6/6 Test #6: BatchDecodeTest .................. Passed 559.75 sec 100% tests passed, 0 tests failed out of 6 Total Test time (real) = 719.07 sec ```

In this PR, we add infra for enabling decode via flashinfer gpu_iface. This PR does not change existing infrastructure and we can still build decode using AOT and JIT. Tested locally ``` Start 5: SingleDecodeTest 5/6 Test #5: SingleDecodeTest ................. Passed 35.12 sec Start 6: BatchDecodeTest 6/6 Test #6: BatchDecodeTest .................. Passed 541.87 sec ``` We will have a follow up PR for enabling AOT decode using flashinfer gpu_iface

CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface` This PR has been tested locally ``` Test project /root/flashinfer/libflashinfer/tests/hip/build Start 1: MathTest 1/6 Test #1: MathTest ......................... Passed 3.40 sec Start 2: PosEncTest 2/6 Test #2: PosEncTest ....................... Passed 3.40 sec Start 3: CascadeTest 3/6 Test #3: CascadeTest ...................... Passed 985.27 sec Start 4: PageTest 4/6 Test #4: PageTest ......................... Passed 112.40 sec Start 5: SingleDecodeTest 5/6 Test #5: SingleDecodeTest ................. Passed 35.46 sec Start 6: BatchDecodeTest 6/6 Test #6: BatchDecodeTest .................. Passed 556.81 sec 100% tests passed, 0 tests failed out of 6 ``` To replicate the tests ``` cd flashinfer/libflashinfer/tests/hip ``` ``` mkdir build && cd build/ ``` ``` cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ .. ``` ``` make ``` ``` ctest ```

In this PR I remove the `libtorch` dependency and removed `test_page.cpp`. `test_page.cpp` is the only unit test that uses libtorch. However, we also have a pytest for testing page. We will use that for validation. Removing the libtorch dependency will help us speed docker builds and remove additional dependencies. ```Test project /root/flashinfer/libflashinfer/tests/hip/build Start 1: MathTest 1/8 Test #1: MathTest ............................ Passed 0.31 sec Start 2: PosEncTest 2/8 Test #2: PosEncTest .......................... Passed 0.31 sec Start 3: CascadeTest 3/8 Test #3: CascadeTest ......................... Passed 1369.12 sec Start 4: SingleDecodeTest 4/8 Test #4: SingleDecodeTest .................... Passed 7726.35 sec Start 5: BatchDecodeTest 5/8 Test #5: BatchDecodeTest ..................... Passed 811.61 sec Start 6: test_mfma_fp32_16x16x16fp16 6/8 Test #6: test_mfma_fp32_16x16x16fp16 ......... Passed 0.30 sec Start 7: test_transpose_4x4_half_registers 7/8 Test #7: test_transpose_4x4_half_registers ... Passed 0.28 sec Start 8: test_rowsum 8/8 Test #8: test_rowsum ......................... Passed 0.27 sec 100% tests passed, 0 tests failed out of 8 ```

### Summary This PR updates the Dockerfile to ensure the Micromamba environment is automatically activated when the Docker container is started, without requiring the user to manually run activation commands. ### Changes - Appended Micromamba shell hook and environment activation to `~/.bashrc`. - This ensures the environment is active in all interactive shell sessions (e.g., `docker run -it`). - Keeps existing build process and environment creation logic unchanged. ### Why? Previously, users needed to manually activate the Micromamba environment after starting the container: This update streamlines the user experience by making the environment ready-to-use immediately after container startup. ### Test You can test this by building and running the container interactively: ```bash docker build -f docker/Dockerfile.rocm_ci --target flashinfer_base -t flashinfer-rocm . 2>&1 | tee docker_build.log ``` and running ```bash docker run -it --network=host --group-add=video \ --privileged --ipc=host --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined --device /dev/kfd \ --device /dev/dri flashinfer-rocm ``` Then inside the container, run: ```bash pip show flashinfer ``` <img width="598" height="120" alt="{AF28D7FF-D427-499B-9FAA-1EE2C4F71C9B}" src="https://github.com/user-attachments/assets/27be1c83-7ccf-41bf-9006-28066a222321" />

This PR fixes some of the unit test failures that occur in Single Decode. It also disables clang formatting of headers. The clang format of headers causes compilation issues. The compiler is unable to find `HIP WARP SYNC INTRINSICS` causing failures. Disabling clang format fixes these issues ``` Start 1: MathTest 1/6 Test ROCm#1: MathTest ......................... Passed 3.31 sec Start 2: PosEncTest 2/6 Test ROCm#2: PosEncTest ....................... Passed 3.36 sec Start 3: CascadeTest 3/6 Test ROCm#3: CascadeTest ...................... Passed 3.35 sec Start 4: PageTest 4/6 Test ROCm#4: PageTest ......................... Passed 114.08 sec Start 5: SingleDecodeTest 5/6 Test ROCm#5: SingleDecodeTest ................. Passed 35.22 sec Start 6: BatchDecodeTest 6/6 Test ROCm#6: BatchDecodeTest .................. Passed 559.75 sec 100% tests passed, 0 tests failed out of 6 Total Test time (real) = 719.07 sec ```

In this PR, we add infra for enabling decode via flashinfer gpu_iface. This PR does not change existing infrastructure and we can still build decode using AOT and JIT. Tested locally ``` Start 5: SingleDecodeTest 5/6 Test ROCm#5: SingleDecodeTest ................. Passed 35.12 sec Start 6: BatchDecodeTest 6/6 Test ROCm#6: BatchDecodeTest .................. Passed 541.87 sec ``` We will have a follow up PR for enabling AOT decode using flashinfer gpu_iface

CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface` This PR has been tested locally ``` Test project /root/flashinfer/libflashinfer/tests/hip/build Start 1: MathTest 1/6 Test ROCm#1: MathTest ......................... Passed 3.40 sec Start 2: PosEncTest 2/6 Test ROCm#2: PosEncTest ....................... Passed 3.40 sec Start 3: CascadeTest 3/6 Test ROCm#3: CascadeTest ...................... Passed 985.27 sec Start 4: PageTest 4/6 Test ROCm#4: PageTest ......................... Passed 112.40 sec Start 5: SingleDecodeTest 5/6 Test ROCm#5: SingleDecodeTest ................. Passed 35.46 sec Start 6: BatchDecodeTest 6/6 Test ROCm#6: BatchDecodeTest .................. Passed 556.81 sec 100% tests passed, 0 tests failed out of 6 ``` To replicate the tests ``` cd flashinfer/libflashinfer/tests/hip ``` ``` mkdir build && cd build/ ``` ``` cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ .. ``` ``` make ``` ``` ctest ```

In this PR I remove the `libtorch` dependency and removed `test_page.cpp`. `test_page.cpp` is the only unit test that uses libtorch. However, we also have a pytest for testing page. We will use that for validation. Removing the libtorch dependency will help us speed docker builds and remove additional dependencies. ```Test project /root/flashinfer/libflashinfer/tests/hip/build Start 1: MathTest 1/8 Test ROCm#1: MathTest ............................ Passed 0.31 sec Start 2: PosEncTest 2/8 Test ROCm#2: PosEncTest .......................... Passed 0.31 sec Start 3: CascadeTest 3/8 Test ROCm#3: CascadeTest ......................... Passed 1369.12 sec Start 4: SingleDecodeTest 4/8 Test ROCm#4: SingleDecodeTest .................... Passed 7726.35 sec Start 5: BatchDecodeTest 5/8 Test ROCm#5: BatchDecodeTest ..................... Passed 811.61 sec Start 6: test_mfma_fp32_16x16x16fp16 6/8 Test ROCm#6: test_mfma_fp32_16x16x16fp16 ......... Passed 0.30 sec Start 7: test_transpose_4x4_half_registers 7/8 Test ROCm#7: test_transpose_4x4_half_registers ... Passed 0.28 sec Start 8: test_rowsum 8/8 Test ROCm#8: test_rowsum ......................... Passed 0.27 sec 100% tests passed, 0 tests failed out of 8 ```

### Summary This PR updates the Dockerfile to ensure the Micromamba environment is automatically activated when the Docker container is started, without requiring the user to manually run activation commands. ### Changes - Appended Micromamba shell hook and environment activation to `~/.bashrc`. - This ensures the environment is active in all interactive shell sessions (e.g., `docker run -it`). - Keeps existing build process and environment creation logic unchanged. ### Why? Previously, users needed to manually activate the Micromamba environment after starting the container: This update streamlines the user experience by making the environment ready-to-use immediately after container startup. ### Test You can test this by building and running the container interactively: ```bash docker build -f docker/Dockerfile.rocm_ci --target flashinfer_base -t flashinfer-rocm . 2>&1 | tee docker_build.log ``` and running ```bash docker run -it --network=host --group-add=video \ --privileged --ipc=host --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined --device /dev/kfd \ --device /dev/dri flashinfer-rocm ``` Then inside the container, run: ```bash pip show flashinfer ``` <img width="598" height="120" alt="{AF28D7FF-D427-499B-9FAA-1EE2C4F71C9B}" src="https://github.com/user-attachments/assets/27be1c83-7ccf-41bf-9006-28066a222321" />

This PR fixes some of the unit test failures that occur in Single Decode. It also disables clang formatting of headers. The clang format of headers causes compilation issues. The compiler is unable to find `HIP WARP SYNC INTRINSICS` causing failures. Disabling clang format fixes these issues ``` Start 1: MathTest 1/6 Test #1: MathTest ......................... Passed 3.31 sec Start 2: PosEncTest 2/6 Test #2: PosEncTest ....................... Passed 3.36 sec Start 3: CascadeTest 3/6 Test #3: CascadeTest ...................... Passed 3.35 sec Start 4: PageTest 4/6 Test #4: PageTest ......................... Passed 114.08 sec Start 5: SingleDecodeTest 5/6 Test #5: SingleDecodeTest ................. Passed 35.22 sec Start 6: BatchDecodeTest 6/6 Test #6: BatchDecodeTest .................. Passed 559.75 sec 100% tests passed, 0 tests failed out of 6 Total Test time (real) = 719.07 sec ```

In this PR, we add infra for enabling decode via flashinfer gpu_iface. This PR does not change existing infrastructure and we can still build decode using AOT and JIT. Tested locally ``` Start 5: SingleDecodeTest 5/6 Test #5: SingleDecodeTest ................. Passed 35.12 sec Start 6: BatchDecodeTest 6/6 Test #6: BatchDecodeTest .................. Passed 541.87 sec ``` We will have a follow up PR for enabling AOT decode using flashinfer gpu_iface

CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface` This PR has been tested locally ``` Test project /root/flashinfer/libflashinfer/tests/hip/build Start 1: MathTest 1/6 Test #1: MathTest ......................... Passed 3.40 sec Start 2: PosEncTest 2/6 Test #2: PosEncTest ....................... Passed 3.40 sec Start 3: CascadeTest 3/6 Test #3: CascadeTest ...................... Passed 985.27 sec Start 4: PageTest 4/6 Test #4: PageTest ......................... Passed 112.40 sec Start 5: SingleDecodeTest 5/6 Test #5: SingleDecodeTest ................. Passed 35.46 sec Start 6: BatchDecodeTest 6/6 Test #6: BatchDecodeTest .................. Passed 556.81 sec 100% tests passed, 0 tests failed out of 6 ``` To replicate the tests ``` cd flashinfer/libflashinfer/tests/hip ``` ``` mkdir build && cd build/ ``` ``` cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ .. ``` ``` make ``` ``` ctest ```

In this PR I remove the `libtorch` dependency and removed `test_page.cpp`. `test_page.cpp` is the only unit test that uses libtorch. However, we also have a pytest for testing page. We will use that for validation. Removing the libtorch dependency will help us speed docker builds and remove additional dependencies. ```Test project /root/flashinfer/libflashinfer/tests/hip/build Start 1: MathTest 1/8 Test #1: MathTest ............................ Passed 0.31 sec Start 2: PosEncTest 2/8 Test #2: PosEncTest .......................... Passed 0.31 sec Start 3: CascadeTest 3/8 Test #3: CascadeTest ......................... Passed 1369.12 sec Start 4: SingleDecodeTest 4/8 Test #4: SingleDecodeTest .................... Passed 7726.35 sec Start 5: BatchDecodeTest 5/8 Test #5: BatchDecodeTest ..................... Passed 811.61 sec Start 6: test_mfma_fp32_16x16x16fp16 6/8 Test #6: test_mfma_fp32_16x16x16fp16 ......... Passed 0.30 sec Start 7: test_transpose_4x4_half_registers 7/8 Test #7: test_transpose_4x4_half_registers ... Passed 0.28 sec Start 8: test_rowsum 8/8 Test #8: test_rowsum ......................... Passed 0.27 sec 100% tests passed, 0 tests failed out of 8 ```

### Summary This PR updates the Dockerfile to ensure the Micromamba environment is automatically activated when the Docker container is started, without requiring the user to manually run activation commands. ### Changes - Appended Micromamba shell hook and environment activation to `~/.bashrc`. - This ensures the environment is active in all interactive shell sessions (e.g., `docker run -it`). - Keeps existing build process and environment creation logic unchanged. ### Why? Previously, users needed to manually activate the Micromamba environment after starting the container: This update streamlines the user experience by making the environment ready-to-use immediately after container startup. ### Test You can test this by building and running the container interactively: ```bash docker build -f docker/Dockerfile.rocm_ci --target flashinfer_base -t flashinfer-rocm . 2>&1 | tee docker_build.log ``` and running ```bash docker run -it --network=host --group-add=video \ --privileged --ipc=host --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined --device /dev/kfd \ --device /dev/dri flashinfer-rocm ``` Then inside the container, run: ```bash pip show flashinfer ``` <img width="598" height="120" alt="{AF28D7FF-D427-499B-9FAA-1EE2C4F71C9B}" src="https://github.com/user-attachments/assets/27be1c83-7ccf-41bf-9006-28066a222321" />

diptorupd requested a review from rtmadduri October 2, 2025 03:26

demandal25 requested review from demandal25 and removed request for demandal25 October 2, 2025 17:46

diptorupd requested a review from Copilot October 3, 2025 19:29

diptorupd force-pushed the fix/docker branch from b9acf6d to 8171486 Compare October 3, 2025 19:29

Copilot AI reviewed Oct 3, 2025

View reviewed changes

docker/Dockerfile.rocm_ci Outdated Show resolved Hide resolved

clintg6 and others added 2 commits October 3, 2025 17:19

Update Dockerfile.rocm_ci

7bd5848

Update docker/Dockerfile.rocm_ci

e18ba78

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Clint <clintg6@users.noreply.github.com>

diptorupd force-pushed the fix/docker branch from 9a08701 to e18ba78 Compare October 3, 2025 22:19

demandal25 self-requested a review October 7, 2025 14:27

Merge branch 'amd-integration' into fix/docker

4c85994

demandal25 approved these changes Oct 7, 2025

View reviewed changes

diptorupd merged commit 47aac38 into amd-integration Oct 7, 2025
1 check passed

diptorupd deleted the fix/docker branch October 23, 2025 16:14

diptorupd mentioned this pull request Nov 19, 2025

Release 012026 #60

Closed

20 tasks

Copilot AI mentioned this pull request Jan 21, 2026

Updated changelog for Jan 2026 release #119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Dockerfile.rocm_ci#6

Update Dockerfile.rocm_ci#6
diptorupd merged 3 commits intoamd-integrationfrom
fix/docker

clintg6 commented Oct 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

diptorupd commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

clintg6 commented Oct 1, 2025

Summary

Changes

Why?

Test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

diptorupd commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants