Skip to content

[CI]Fix tests/v1/sample/test_logprobs.py#29155

Closed
charlifu wants to merge 2 commits intovllm-project:mainfrom
ROCm:amd/fix_test_logprobs
Closed

[CI]Fix tests/v1/sample/test_logprobs.py#29155
charlifu wants to merge 2 commits intovllm-project:mainfrom
ROCm:amd/fix_test_logprobs

Conversation

@charlifu
Copy link
Contributor

@charlifu charlifu commented Nov 21, 2025

Error msg:

(EngineCore_DP0 pid=462918) Process EngineCore_DP0:
(EngineCore_DP0 pid=462918) Traceback (most recent call last):
(EngineCore_DP0 pid=462918)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=462918)     self.run()
(EngineCore_DP0 pid=462918)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=462918)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/engine/core.py", line 846, in run_engine_core
(EngineCore_DP0 pid=462918)     raise e
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/engine/core.py", line 835, in run_engine_core
(EngineCore_DP0 pid=462918)     engine_core.run_busy_loop()
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/engine/core.py", line 862, in run_busy_loop
(EngineCore_DP0 pid=462918)     self._process_engine_step()
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/engine/core.py", line 891, in _process_engine_step
(EngineCore_DP0 pid=462918)     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=462918)                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/engine/core.py", line 347, in step
(EngineCore_DP0 pid=462918)     model_output = self.model_executor.sample_tokens(grammar_output)
(EngineCore_DP0 pid=462918)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/executor/uniproc_executor.py", line 107, in sample_tokens
(EngineCore_DP0 pid=462918)     return self.collective_rpc(
(EngineCore_DP0 pid=462918)            ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=462918)     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=462918)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/serial_utils.py", line 479, in run_method
(EngineCore_DP0 pid=462918)     return func(*args, **kwargs)
(EngineCore_DP0 pid=462918)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918)   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=462918)     return func(*args, **kwargs)
(EngineCore_DP0 pid=462918)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/worker/gpu_worker.py", line 519, in sample_tokens
(EngineCore_DP0 pid=462918)     return self.model_runner.sample_tokens(grammar_output)
(EngineCore_DP0 pid=462918)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918)   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=462918)     return func(*args, **kwargs)
(EngineCore_DP0 pid=462918)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 3001, in sample_tokens
(EngineCore_DP0 pid=462918)     ) = self._bookkeeping_sync(
(EngineCore_DP0 pid=462918)         ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 2570, in _bookkeeping_sync
(EngineCore_DP0 pid=462918)     logprobs_tensors.tolists(cu_num_accepted_tokens)
(EngineCore_DP0 pid=462918)   File "/workspace/vllm/vllm/v1/outputs.py", line 67, in tolists
(EngineCore_DP0 pid=462918)     self.logprobs.cpu().numpy(),
(EngineCore_DP0 pid=462918)     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=462918) TypeError: Got unsupported ScalarType BFloat16

Signed-off-by: charlifu <charlifu@amd.com>
@mergify mergify bot added the v1 label Nov 21, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a TypeError that occurs when converting a bfloat16 tensor to a NumPy array for log probabilities. The fix, which involves casting the logprobs tensor to float32 using .float() before calling .numpy(), is correct and directly resolves the issue described in the traceback. NumPy does not support the bfloat16 dtype, so this conversion is necessary. The change is well-targeted and I don't see any potential negative side effects. The code is now more robust for models using bfloat16 precision.

Copy link
Collaborator

@gshtras gshtras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the other tensors in this function are already in numpy-able dtype

@mgoin mgoin requested review from Jialin and njhill November 22, 2025 23:12
@mgoin
Copy link
Member

mgoin commented Nov 22, 2025

cc @njhill @Jialin

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed ci-failure Issue about an unexpected test failure in CI labels Nov 22, 2025
Copy link
Collaborator

@Jialin Jialin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix.

@charlifu Does it offend an existing CI? I'm wondering why it doesn't catch in the first place.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charlifu ... I think this should already be fixed by #29216. Could you check with latest main first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-failure Issue about an unexpected test failure in CI ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants