[Hardware][AMD][Bugfix] Fix spec decode acceptance length test#32825
Closed
mawong-amd wants to merge 1 commit intovllm-project:mainfrom
Closed
[Hardware][AMD][Bugfix] Fix spec decode acceptance length test#32825mawong-amd wants to merge 1 commit intovllm-project:mainfrom
mawong-amd wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces fixes to enable speculative decoding acceptance length tests on ROCm platforms. The key changes include correcting the attention backend detection logic in test_acceptance_length.py by replacing a faulty hasattr check and adding a specific path for ROCm. Additionally, a bug in eagle.py's dummy_run is fixed to correctly handle cases where CUDA graphs are disabled. The changes are logical, well-targeted, and improve compatibility with ROCm. I have reviewed the changes and they look good to me.
3a4c5d3 to
a39c272
Compare
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
a39c272 to
ec3ec3e
Compare
Contributor
Author
|
Closing as a duplicate of #32787 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR fixes the speculative decoding acceptance length tests, first implemented in #32030, to run on ROCm.
The current implementation fails because it erroneously calls
current_platform.get_valid_backendswhich is not implemented on non-CUDA platforms. It guards this call behindhasattr(current_platform, "get_valid_backends"), but this does not work because thePlatformclass implements a__getattr__method which meanshasattr(current_platform, attribute)will trivially always return True for any attribute, even though the value of said attribute may beNone.Test Plan
pytest -sv tests/v1/spec_decode/test_acceptance_length.pyon AMD ROCm.Test Result
The above test passes on MI300X and MI325X.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.