Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
03f366c
[AMD] Support ROCm 7.2 images
akao-amd Feb 9, 2026
2c93971
[AMD] Tweak test files
akao-amd Feb 9, 2026
f41d4fb
[AMD] Add tests but not enable them for PR event
akao-amd Feb 9, 2026
ffff7d4
[AMD][DO NOT MERGE] Changes to generic codes
akao-amd Feb 9, 2026
6a288da
Fix misleading naming and restore vllm fallback path for AMD
yctseng0211 Feb 9, 2026
a94ac51
revet adding vllm path in fused_moe
yctseng0211 Feb 9, 2026
75ed70d
test on, release off, switch to 0209-preview
akao-amd Feb 6, 2026
4e19b66
Fix human-evel editable build with legacy setup.py
akao-amd Feb 10, 2026
954aba3
Fix non-deterministic CI test partitioning by adding filename tie-bre…
yctseng0211 Feb 10, 2026
42b7f9f
turn on all pr tests
yctseng0211 Feb 10, 2026
02878b1
Fix Janus pro
akao-amd Feb 10, 2026
875e49c
Fix test_mamba_ssm_ssd.py
akao-amd Feb 10, 2026
3603182
fix lint
bingxche Feb 10, 2026
49c9aef
fix triton.knobs.amd.use_buffer_ops
bingxche Feb 10, 2026
3882f5e
temp pip install amdsmi for multimodal test
bingxche Feb 10, 2026
0b25784
use AMDGCN_USE_BUFFER_OP instead of triton.knobs.amd.use_buffer_ops
bingxche Feb 10, 2026
ae3a281
rccl warmup for amd ci
yctseng0211 Feb 10, 2026
2cd26fc
turn on multimodal test for 7.0 pr test
yctseng0211 Feb 10, 2026
2551417
build mori in rocm7.2
yctseng0211 Feb 10, 2026
bce8005
Add PYTORCH_ROCM_ARCH
akao-amd Feb 10, 2026
4f73a3b
Update nightly test workflow to run on a schedule and adjust monitore…
michaelzhang-ai Feb 11, 2026
8add344
increase timeout to 60min for multimodal-gen-test-1-gpu-amd
bingxche Feb 11, 2026
e7b6a0f
do not cancel docker image release through pr push
bingxche Feb 11, 2026
105418b
wrap up rocm720 related workflow, docker release, pr tests and nightl…
bingxche Feb 11, 2026
2703b6a
revet temp turn-on
yctseng0211 Feb 11, 2026
0c0a33e
Revert "revet temp turn-on"
bingxche Feb 11, 2026
81e1352
run all tests in parallel in pr test rocm720
bingxche Feb 11, 2026
ac7f401
fix sglang branch in rocm720 dockerfile
bingxche Feb 11, 2026
8b05235
fix pretend version for rocm72
yctseng0211 Feb 11, 2026
6578c95
remove aiter rebuild in install dependencies
yctseng0211 Feb 11, 2026
0db456b
fix hardcoded fallback image finding logic
bingxche Feb 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
868 changes: 868 additions & 0 deletions .github/workflows/nightly-test-amd-rocm720.yml

Large diffs are not rendered by default.

793 changes: 793 additions & 0 deletions .github/workflows/pr-test-amd-rocm720.yml

Large diffs are not rendered by default.

22 changes: 20 additions & 2 deletions .github/workflows/pr-test-amd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,16 @@ jobs:

multimodal-gen-test-1-gpu-amd:
needs: [check-changes]
if: needs.check-changes.outputs.multimodal_gen == 'true'
if: |
always() &&
(
(inputs.target_stage == 'multimodal-gen-test-1-gpu-amd') ||
(
!inputs.target_stage &&
(!failure() && !cancelled()) &&
((needs.check-changes.outputs.main_package == 'true') || (needs.check-changes.outputs.sgl_kernel == 'true'))
)
)
strategy:
fail-fast: false
max-parallel: 1 # Run one at a time to avoid eviction from resource exhaustion during AITER kernel JIT
Expand Down Expand Up @@ -516,7 +525,16 @@ jobs:

multimodal-gen-test-2-gpu-amd:
needs: [check-changes]
if: needs.check-changes.outputs.multimodal_gen == 'true'
if: |
always() &&
(
(inputs.target_stage == 'multimodal-gen-test-2-gpu-amd') ||
(
!inputs.target_stage &&
(!failure() && !cancelled()) &&
((needs.check-changes.outputs.main_package == 'true') || (needs.check-changes.outputs.sgl_kernel == 'true'))
)
)
strategy:
fail-fast: false
max-parallel: 1 # Run one at a time to avoid eviction from resource exhaustion during AITER kernel JIT
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
name: Release Docker Images ROCm 7.2.0 Nightly Preview (AMD)
on:
workflow_dispatch:
schedule:
- cron: '0 13 * * *'

concurrency:
# A PR number if a pull request and otherwise the commit hash. This cancels
# queued and in-progress runs for the same PR (presubmit) or commit
# (postsubmit). The workflow name is prepended to avoid conflicts between
# different workflows.
group: ${{ github.workflow }}-${{ github.event.number || github.sha }}
cancel-in-progress: True

jobs:
publish:
if: github.repository == 'sgl-project/sglang'
runs-on: amd-docker-scale
environment: 'prod'
strategy:
fail-fast: false
matrix:
gpu_arch: ['gfx942-rocm720', 'gfx950-rocm720']
build_type: ['all']
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0 # Required for git describe to find tags

- name: "Set Date"
run: |
echo "DATE=$(date +%Y%m%d)" >> $GITHUB_ENV

- name: Get version from latest tag
id: version
run: |
# Get the latest version tag sorted by version number (e.g., v0.5.7 -> 0.5.7)
VERSION=$(git tag -l 'v[0-9]*' --sort=-v:refname | head -1 | sed 's/^v//')

if [ -z "$VERSION" ]; then
echo "::error::Could not determine version from git tags"
exit 1
fi

# Get short commit hash of current HEAD
COMMIT_HASH=$(git rev-parse --short HEAD)

# Compose pretend version for setuptools_scm: e.g., 0.5.8.post1.dev20260211+g1a2b3c4
PRETEND_VERSION="${VERSION}.dev${{ env.DATE }}+g${COMMIT_HASH}"

echo "version=${VERSION}" >> $GITHUB_OUTPUT
echo "pretend_version=${PRETEND_VERSION}" >> $GITHUB_OUTPUT
echo "Detected version: ${VERSION}"
echo "Pretend version for pip: ${PRETEND_VERSION}"

- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_AMD_USERNAME }}
password: ${{ secrets.DOCKERHUB_AMD_TOKEN }}

- name: Build and Push
run: |
version=${{ steps.version.outputs.version }}
pretend_version=${{ steps.version.outputs.pretend_version }}
echo "Version: ${version}"
echo "Pretend version: ${pretend_version}"

if [ "${{ matrix.gpu_arch }}" = "gfx942-rocm720" ]; then
rocm_tag="rocm720-mi30x"
elif [ "${{ matrix.gpu_arch }}" = "gfx950-rocm720" ]; then
rocm_tag="rocm720-mi35x"
else
echo "Unsupported gfx arch"
exit 1
fi

tag=v${version}-${rocm_tag}

docker build . -f docker/rocm720.Dockerfile --build-arg BUILD_TYPE=${{ matrix.build_type }} --build-arg GPU_ARCH=${{ matrix.gpu_arch }} --build-arg ENABLE_MORI=1 --build-arg NIC_BACKEND=ainic --build-arg SETUPTOOLS_SCM_PRETEND_VERSION=${pretend_version} -t rocm/sgl-dev:${tag}-${{ env.DATE }}-preview --no-cache
docker push rocm/sgl-dev:${tag}-${{ env.DATE }}-preview
Loading
Loading