-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[FEAT][ROCm] Integrate Fused MoE Kernels from AITER #14967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
DarkLight1337
merged 32 commits into
vllm-project:main
from
EmbeddedLLM:aiter-fmoe-integration
Mar 26, 2025
Merged
Changes from 18 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
4c296ae
add AITER in rocm docker base file
vllmellm 8761424
add AITER fused moe kernels
vllmellm 18e0717
add preprocessing steps required when using AITER moe kernels
vllmellm 19b0cd2
add required ENV variables to enabled AITER ops
vllmellm 38d5995
add test for fused moe dispatcher logic
vllmellm 6028eab
bugfix: update aiter moe enable check
vllmellm fab94ea
add end to end model test when AITER ops are enabled for rocm
vllmellm 8e419df
fix pre-commit errors
vllmellm d78a2ae
enable AITER for rocm platform in more tests
vllmellm 06c92e6
enable AITER for rocm platform in related tests cases for fp8 quant
vllmellm 8976e55
bugfix AITER block scaled moe wrong depency on a wrong envs variable
vllmellm 8109aa0
Merge branch 'vllm-project:main' into aiter-fmoe-integration
vllmellm 4d8d15b
separate out the moe kernels from aiter into different file
vllmellm 4b942b7
Merge branch 'aiter-fmoe-integration' of https://github.com/EmbeddedL…
vllmellm c069a66
move AITER moe enability check from top of file into function level s…
vllmellm 4047344
fix AITER Fused MoE distpatcher tests
vllmellm 547464d
fix get envs variables in unit tests
vllmellm b9158ad
Merge remote-tracking branch 'origin/main' into aiter-fmoe-integration
tjtanaa fab7511
remove cascading logic from vllm.envs
vllmellm f7fffa0
move out the processing weights required for AITER MoE
vllmellm aa38d95
refactor aiter unit test flags into decorator
tjtanaa 7d8707b
modify the rocm AITER check tests based on new decorator and include …
vllmellm fd36f6c
update run-amd-test.sh; fix skip rocm aiter test flag
tjtanaa 0b55c4c
Merge remote-tracking branch 'origin/main' into aiter-fmoe-integration
vllmellm b8dd58a
bugfix topk softmax functions to return the tensors
vllmellm d2f86c0
remove unused tests for AITER MoE and keep only mixtral moe unit test
vllmellm 3f230d7
Merge remote-tracking branch 'origin/main' into aiter-fmoe-integration
vllmellm 91d0bda
Merge remote-tracking branch 'origin/main' into aiter-fmoe-integration
vllmellm 05734e4
fix test cases in test_fp8.py to test AITER ops enability for load an…
vllmellm f242bf2
remove the extra line gaps and revert the test_phimoe.py to its origi…
vllmellm 598dec9
Merge remote-tracking branch 'origin/main' into aiter-fmoe-integration
vllmellm 61edbd4
match the VLLM_ROCM_USE_AITER_FP8_BLOCK_SCALED_MOE variable in envs t…
vllmellm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,13 +5,15 @@ | |
| """ | ||
| import copy | ||
| import json | ||
| import os | ||
|
|
||
| import jsonschema | ||
| import jsonschema.exceptions | ||
| import pytest | ||
|
|
||
| from vllm.entrypoints.openai.tool_parsers.mistral_tool_parser import ( # noqa | ||
| MistralToolParser) | ||
| from vllm.platforms import current_platform | ||
| from vllm.sampling_params import GuidedDecodingParams, SamplingParams | ||
|
|
||
| from ...utils import check_logprobs_close | ||
|
|
@@ -174,15 +176,16 @@ | |
| @pytest.mark.parametrize("dtype", ["bfloat16"]) | ||
| @pytest.mark.parametrize("max_tokens", [64]) | ||
| @pytest.mark.parametrize("num_logprobs", [5]) | ||
| def test_models( | ||
| hf_runner, | ||
| vllm_runner, | ||
| example_prompts, | ||
| model: str, | ||
| dtype: str, | ||
| max_tokens: int, | ||
| num_logprobs: int, | ||
| ) -> None: | ||
| @pytest.mark.parametrize( | ||
| "use_rocm_aiter", [True, False] if current_platform.is_rocm() else [False]) | ||
| def test_models(hf_runner, vllm_runner, example_prompts, model: str, | ||
| dtype: str, max_tokens: int, num_logprobs: int, | ||
| use_rocm_aiter: bool, monkeypatch) -> None: | ||
| if use_rocm_aiter: | ||
| if os.getenv("SKIP_ROCM_ATIER_MODEL_TEST_CASES") == "true": | ||
| pytest.skip("Skipping test suite for ROCM AITER") | ||
| monkeypatch.setenv("VLLM_ROCM_USE_AITER", "1") | ||
|
|
||
| # TODO(sang): Sliding window should be tested separately. | ||
| with hf_runner(model, dtype=dtype) as hf_model: | ||
| hf_outputs = hf_model.generate_greedy_logprobs_limit( | ||
|
|
@@ -206,14 +209,16 @@ def test_models( | |
| @pytest.mark.parametrize("dtype", ["bfloat16"]) | ||
| @pytest.mark.parametrize("max_tokens", [64]) | ||
| @pytest.mark.parametrize("num_logprobs", [5]) | ||
| def test_mistral_format( | ||
| vllm_runner, | ||
| example_prompts, | ||
| model: str, | ||
| dtype: str, | ||
| max_tokens: int, | ||
| num_logprobs: int, | ||
| ) -> None: | ||
| @pytest.mark.parametrize( | ||
| "use_rocm_aiter", [True, False] if current_platform.is_rocm() else [False]) | ||
| def test_mistral_format(vllm_runner, example_prompts, model: str, dtype: str, | ||
| max_tokens: int, num_logprobs: int, | ||
| use_rocm_aiter: bool, monkeypatch) -> None: | ||
| if use_rocm_aiter: | ||
| if os.getenv("SKIP_ROCM_ATIER_MODEL_TEST_CASES") == "true": | ||
| pytest.skip("Skipping test suite for ROCM AITER") | ||
| monkeypatch.setenv("VLLM_ROCM_USE_AITER", "1") | ||
|
|
||
| with vllm_runner( | ||
| model, | ||
| dtype=dtype, | ||
|
|
@@ -244,11 +249,15 @@ def test_mistral_format( | |
|
|
||
| @pytest.mark.parametrize("model", MISTRAL_FORMAT_MODELS) | ||
| @pytest.mark.parametrize("dtype", ["bfloat16"]) | ||
| def test_mistral_symbolic_languages( | ||
| vllm_runner, | ||
| model: str, | ||
| dtype: str, | ||
| ) -> None: | ||
| @pytest.mark.parametrize( | ||
| "use_rocm_aiter", [True, False] if current_platform.is_rocm() else [False]) | ||
| def test_mistral_symbolic_languages(vllm_runner, model: str, dtype: str, | ||
| use_rocm_aiter: bool, monkeypatch) -> None: | ||
| if use_rocm_aiter: | ||
| if os.getenv("SKIP_ROCM_ATIER_MODEL_TEST_CASES") == "true": | ||
| pytest.skip("Skipping test suite for ROCM AITER") | ||
| monkeypatch.setenv("VLLM_ROCM_USE_AITER", "1") | ||
|
|
||
| with vllm_runner(model, | ||
| dtype=dtype, | ||
| max_model_len=8192, | ||
|
|
@@ -266,11 +275,15 @@ def test_mistral_symbolic_languages( | |
| @pytest.mark.parametrize("dtype", ["bfloat16"]) | ||
| @pytest.mark.parametrize("model", | ||
| MISTRAL_FORMAT_MODELS) # v1 can't do func calling | ||
| def test_mistral_function_calling( | ||
| vllm_runner, | ||
| model: str, | ||
| dtype: str, | ||
| ) -> None: | ||
| @pytest.mark.parametrize( | ||
|
||
| "use_rocm_aiter", [True, False] if current_platform.is_rocm() else [False]) | ||
| def test_mistral_function_calling(vllm_runner, model: str, dtype: str, | ||
| use_rocm_aiter: bool, monkeypatch) -> None: | ||
| if use_rocm_aiter: | ||
| if os.getenv("SKIP_ROCM_ATIER_MODEL_TEST_CASES") == "true": | ||
| pytest.skip("Skipping test suite for ROCM AITER") | ||
| monkeypatch.setenv("VLLM_ROCM_USE_AITER", "1") | ||
|
|
||
| with vllm_runner(model, | ||
| dtype=dtype, | ||
| tokenizer_mode="mistral", | ||
|
|
@@ -301,11 +314,15 @@ def test_mistral_function_calling( | |
| @pytest.mark.parametrize("model", MODELS) | ||
| @pytest.mark.parametrize("guided_backend", | ||
| ["outlines", "lm-format-enforcer", "xgrammar"]) | ||
| def test_mistral_guided_decoding( | ||
| vllm_runner, | ||
| model: str, | ||
| guided_backend: str, | ||
| ) -> None: | ||
| @pytest.mark.parametrize( | ||
| "use_rocm_aiter", [True, False] if current_platform.is_rocm() else [False]) | ||
| def test_mistral_guided_decoding(vllm_runner, model: str, guided_backend: str, | ||
| use_rocm_aiter: bool, monkeypatch) -> None: | ||
| if use_rocm_aiter: | ||
| if os.getenv("SKIP_ROCM_ATIER_MODEL_TEST_CASES") == "true": | ||
| pytest.skip("Skipping test suite for ROCM AITER") | ||
| monkeypatch.setenv("VLLM_ROCM_USE_AITER", "1") | ||
|
|
||
| with vllm_runner(model, dtype='bfloat16', | ||
| tokenizer_mode="mistral") as vllm_model: | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like @DarkLight1337's feedback on #14959 to use pytest custom markers, instead of an environment variable, to selectively enable/disable these tests.
I assume we are disabling these because AITER isn't built in CI? If so we should change that :). I'm under the impression that CI just uses the Rocm dockerfile, which you've updated to include AITER, but I could be mistaken.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tried to introduce the pytest.marker for use_rocm_aiter, in a minimal way.
Without changing the buildkite command: e.g.
pytest -v -s models/decoder_only/language -m 'core_model or quant_model'fromvllm/.buildkite/test-pipeline.yaml
Line 395 in 61f4121
@DarkLight1337 @SageMoore
Do you have a recommendation as to how should we use pytest marker without affecting the commands in buildkite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SageMoore @DarkLight1337 Since we have been ensuring the unit tests passing on a particular AITER commit, we will enable the AITER kernel tests by default. In this case, we don't need to disable AITER. This also reduces the need to add pytest marker or any form of decorators.
The AITER commits are specified in the
Dockerfile.rocm_base.So, is it ok to keep it as follows?