-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[AMD] rocm 7.2 image release, PR test, Nightly Test #17799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
03f366c
[AMD] Support ROCm 7.2 images
akao-amd 2c93971
[AMD] Tweak test files
akao-amd f41d4fb
[AMD] Add tests but not enable them for PR event
akao-amd ffff7d4
[AMD][DO NOT MERGE] Changes to generic codes
akao-amd 6a288da
Fix misleading naming and restore vllm fallback path for AMD
yctseng0211 a94ac51
revet adding vllm path in fused_moe
yctseng0211 75ed70d
test on, release off, switch to 0209-preview
akao-amd 4e19b66
Fix human-evel editable build with legacy setup.py
akao-amd 954aba3
Fix non-deterministic CI test partitioning by adding filename tie-bre…
yctseng0211 42b7f9f
turn on all pr tests
yctseng0211 02878b1
Fix Janus pro
akao-amd 875e49c
Fix test_mamba_ssm_ssd.py
akao-amd 3603182
fix lint
bingxche 49c9aef
fix triton.knobs.amd.use_buffer_ops
bingxche 3882f5e
temp pip install amdsmi for multimodal test
bingxche 0b25784
use AMDGCN_USE_BUFFER_OP instead of triton.knobs.amd.use_buffer_ops
bingxche ae3a281
rccl warmup for amd ci
yctseng0211 2cd26fc
turn on multimodal test for 7.0 pr test
yctseng0211 2551417
build mori in rocm7.2
yctseng0211 bce8005
Add PYTORCH_ROCM_ARCH
akao-amd 4f73a3b
Update nightly test workflow to run on a schedule and adjust monitore…
michaelzhang-ai 8add344
increase timeout to 60min for multimodal-gen-test-1-gpu-amd
bingxche e7b6a0f
do not cancel docker image release through pr push
bingxche 105418b
wrap up rocm720 related workflow, docker release, pr tests and nightl…
bingxche 2703b6a
revet temp turn-on
yctseng0211 0c0a33e
Revert "revet temp turn-on"
bingxche 81e1352
run all tests in parallel in pr test rocm720
bingxche ac7f401
fix sglang branch in rocm720 dockerfile
bingxche 8b05235
fix pretend version for rocm72
yctseng0211 6578c95
remove aiter rebuild in install dependencies
yctseng0211 0db456b
fix hardcoded fallback image finding logic
bingxche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
82 changes: 82 additions & 0 deletions
82
.github/workflows/release-docker-amd-rocm720-nightly-preview.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| name: Release Docker Images ROCm 7.2.0 Nightly Preview (AMD) | ||
| on: | ||
| workflow_dispatch: | ||
| schedule: | ||
| - cron: '0 13 * * *' | ||
|
|
||
| concurrency: | ||
| # A PR number if a pull request and otherwise the commit hash. This cancels | ||
| # queued and in-progress runs for the same PR (presubmit) or commit | ||
| # (postsubmit). The workflow name is prepended to avoid conflicts between | ||
| # different workflows. | ||
| group: ${{ github.workflow }}-${{ github.event.number || github.sha }} | ||
| cancel-in-progress: True | ||
|
|
||
| jobs: | ||
| publish: | ||
| if: github.repository == 'sgl-project/sglang' | ||
| runs-on: amd-docker-scale | ||
| environment: 'prod' | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| gpu_arch: ['gfx942-rocm720', 'gfx950-rocm720'] | ||
| build_type: ['all'] | ||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 # Required for git describe to find tags | ||
|
|
||
| - name: "Set Date" | ||
| run: | | ||
| echo "DATE=$(date +%Y%m%d)" >> $GITHUB_ENV | ||
|
|
||
| - name: Get version from latest tag | ||
| id: version | ||
| run: | | ||
| # Get the latest version tag sorted by version number (e.g., v0.5.7 -> 0.5.7) | ||
| VERSION=$(git tag -l 'v[0-9]*' --sort=-v:refname | head -1 | sed 's/^v//') | ||
|
|
||
| if [ -z "$VERSION" ]; then | ||
| echo "::error::Could not determine version from git tags" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Get short commit hash of current HEAD | ||
| COMMIT_HASH=$(git rev-parse --short HEAD) | ||
|
|
||
| # Compose pretend version for setuptools_scm: e.g., 0.5.8.post1.dev20260211+g1a2b3c4 | ||
| PRETEND_VERSION="${VERSION}.dev${{ env.DATE }}+g${COMMIT_HASH}" | ||
|
|
||
| echo "version=${VERSION}" >> $GITHUB_OUTPUT | ||
| echo "pretend_version=${PRETEND_VERSION}" >> $GITHUB_OUTPUT | ||
| echo "Detected version: ${VERSION}" | ||
| echo "Pretend version for pip: ${PRETEND_VERSION}" | ||
|
|
||
| - name: Login to Docker Hub | ||
| uses: docker/login-action@v2 | ||
| with: | ||
| username: ${{ secrets.DOCKERHUB_AMD_USERNAME }} | ||
| password: ${{ secrets.DOCKERHUB_AMD_TOKEN }} | ||
|
|
||
| - name: Build and Push | ||
| run: | | ||
| version=${{ steps.version.outputs.version }} | ||
| pretend_version=${{ steps.version.outputs.pretend_version }} | ||
| echo "Version: ${version}" | ||
| echo "Pretend version: ${pretend_version}" | ||
|
|
||
| if [ "${{ matrix.gpu_arch }}" = "gfx942-rocm720" ]; then | ||
| rocm_tag="rocm720-mi30x" | ||
| elif [ "${{ matrix.gpu_arch }}" = "gfx950-rocm720" ]; then | ||
| rocm_tag="rocm720-mi35x" | ||
| else | ||
| echo "Unsupported gfx arch" | ||
| exit 1 | ||
| fi | ||
|
|
||
| tag=v${version}-${rocm_tag} | ||
|
|
||
| docker build . -f docker/rocm720.Dockerfile --build-arg BUILD_TYPE=${{ matrix.build_type }} --build-arg GPU_ARCH=${{ matrix.gpu_arch }} --build-arg ENABLE_MORI=1 --build-arg NIC_BACKEND=ainic --build-arg SETUPTOOLS_SCM_PRETEND_VERSION=${pretend_version} -t rocm/sgl-dev:${tag}-${{ env.DATE }}-preview --no-cache | ||
| docker push rocm/sgl-dev:${tag}-${{ env.DATE }}-preview | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.