[ROCm][CI] Fix spec decode profile assertion and logprob test determinism#35043
[ROCm][CI] Fix spec decode profile assertion and logprob test determinism#35043vllm-bot merged 4 commits intovllm-project:mainfrom
Conversation
…ched_tokens Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…rob test Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
Hi @AndreasKaratzas, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
The pull request addresses two issues related to speculative decoding on ROCm: an assertion failure in gpu_model_runner.py and non-deterministic logprob comparisons in test_logprobs.py. The changes correctly update the assertion to use self.max_num_tokens and introduce ROCM_DETERMINISM_KWARGS to ensure deterministic execution for logprob tests on ROCm. The changes are well-explained and directly address the identified problems.
| ROCM_DETERMINISM_KWARGS: dict = ( | ||
| dict( | ||
| max_num_seqs=1, | ||
| ) | ||
| if current_platform.is_rocm() | ||
| else {} | ||
| ) |
There was a problem hiding this comment.
The ROCM_DETERMINISM_KWARGS dictionary currently only sets max_num_seqs=1. The PR description mentions enforce_eager and async_scheduling=False as part of the determinism kwargs. These should also be included in the dictionary to fully align with the described fix and ensure consistent execution paths on ROCm.
| ROCM_DETERMINISM_KWARGS: dict = ( | |
| dict( | |
| max_num_seqs=1, | |
| ) | |
| if current_platform.is_rocm() | |
| else {} | |
| ) | |
| ROCM_DETERMINISM_KWARGS: dict = ( | |
| dict( | |
| enforce_eager=True, | |
| async_scheduling=False, | |
| max_num_seqs=1, | |
| ) | |
| if current_platform.is_rocm() | |
| else {} | |
| ) |
There was a problem hiding this comment.
I've updated the description already, apparently those args were unnecessary.
|
This PR depends on: That's why pre-commit is failing. EDIT: This is no longer true. I have reverted the change in |
|
I'm going to revert the change in |
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…nism (vllm-project#35043) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…nism (vllm-project#35043) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…nism (vllm-project#35043) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…nism (vllm-project#35043) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Fixes two issues blocking spec decode logprob tests on ROCm:
Profile run assertion failure (
gpu_model_runner.py): The_dummy_runmethod assertednum_tokens <= self.scheduler_config.max_num_batched_tokens, but with speculative decodingmax_num_tokens(which accounts for verification tokens) can exceedmax_num_batched_tokens. Updated the assertion to useself.max_num_tokens, consistent with the rest of the runner.Non-deterministic logprob comparison (
test_logprobs.py): The ref LLM and spec-decode LLM used different batch sizes, which on ROCm triggers non-associative floating-point reduction differences in attention/GEMM kernels. These numerical divergences were misattributed to spec decode incorrectness. AddedROCM_DETERMINISM_KWARGS(max_num_seqs=1) applied to both LLM instances on ROCm only, pinning identical execution paths. No behavioral change on other platforms.Test Plan