[CI] Use `float16` for `test_bad_words` to ensure compatibility with XPUs by faaany · Pull Request #27089 · vllm-project/vllm

faaany · 2025-10-17T09:05:38Z

Recent updates to the UT model from PR #2405 have caused the test_bad_words unit test to fail in the XPU CI. The failure is not due to the bad_words feature itself, but rather due to cumulative precision issues introduced by the updated tiny model with BF16 when running on XPUs.

This PR updates the test_bad_words unit test to use the float16 data type for the model. The change ensures compatibility and numerical stability on Intel XPUs. Switching to float16 resolves the precision discrepancies observed with float32 on XPUs while preserving the correctness of the test.

The updated test has been validated across all supported platforms (CPU, GPU, and XPU) to ensure consistent behavior.

Test

pytest -rA tests/v1/sample/test_sampling_params_e2e.py

Test Result on both CUDA and XPU

================================================ short test summary info ================================================
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_n_gt_1
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_best_of
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_penalties
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_stop
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_stop_token_ids
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_detokenize_false
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_bad_words
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_logits_processor
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_allowed_token_ids
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_seed
============================================ 10 passed, 2 warnings in 36.30s ============================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Fanli Lin <fanli.lin@intel.com>

faaany · 2025-10-17T09:06:33Z

cc @jikunshang @yma11

gemini-code-assist

Code Review

This pull request aims to fix a failing test on XPU by changing the data type to float16. The proposed change modifies a shared test fixture, which affects all tests in the file, not just the failing one. While this resolves the immediate issue, it introduces a potential risk by reducing precision for all tests, which could lead to future maintenance and correctness issues. My review includes a suggestion to isolate this change to only the affected test for a safer and more maintainable solution.

gemini-code-assist · 2025-10-17T09:07:04Z

tests/v1/sample/test_sampling_params_e2e.py

 @pytest.fixture(scope="module")
 def llm() -> LLM:
-    return LLM(MODEL, enforce_eager=True)
+    return LLM(MODEL, enforce_eager=True, dtype="half")


This change to dtype="half" affects all 10 tests in this file since they share the llm fixture. While this fixes test_bad_words on XPU, applying a precision-reducing change this broadly is risky. It could introduce subtle, hard-to-debug numerical issues in other tests now or in the future, especially on different hardware.

For better maintainability and to prevent unintended side effects, it's safer to isolate this change. I recommend creating a new fixture with dtype="half" and using it only for test_bad_words.

For example:

@pytest.fixture(scope="module") def llm_half() -> LLM: return LLM(MODEL, enforce_eager=True, dtype="half") # ... in test_bad_words: def test_bad_words(llm_half): # ...

This would require updating test_bad_words to use the new fixture, but it makes the change's scope and purpose explicit and avoids impacting other tests.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.

@codex fix this CI failure
@codex address that feedback

chatgpt-codex-connector · 2025-10-17T09:08:35Z

tests/v1/sample/test_sampling_params_e2e.py

 @pytest.fixture(scope="module")
 def llm() -> LLM:
-    return LLM(MODEL, enforce_eager=True)
+    return LLM(MODEL, enforce_eager=True, dtype="half")


Forcing float16 breaks CPU architectures lacking fp16 support

The fixture now constructs the shared LLM with dtype="half" unconditionally. Previously the default auto path selected the first value from current_platform.supported_dtypes, which resolves to float32 on platforms that disable fp16 (e.g. PowerPC and RISC‑V per the logic in vllm/platforms/cpu.py). Hard‑coding half bypasses that guard and will attempt to load the model in float16 on those architectures, reintroducing the segmentation faults and ValueErrors the auto logic avoided. Consider selecting half only when the current platform reports fp16 as supported or limit the dtype change to the single test_bad_words case instead of the whole module.

Useful? React with 👍 / 👎.

jikunshang · 2025-10-17T09:24:38Z

maybe this due to kernel accuracy issue on bf16 + Intel GPU.

faaany · 2025-10-20T08:25:09Z

verified that the root-cause is ipex rms_norm kernel bf16 accuracy issue. close.

faaany added 2 commits October 17, 2025 01:40

use FP1 instead

0700fdd

Signed-off-by: Fanli Lin <fanli.lin@intel.com>

Merge branch 'main' into fix_ut

6ab025c

mergify bot added the v1 label Oct 17, 2025

gemini-code-assist bot reviewed Oct 17, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 17, 2025

View reviewed changes

faaany closed this Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Use `float16` for `test_bad_words` to ensure compatibility with XPUs#27089

[CI] Use `float16` for `test_bad_words` to ensure compatibility with XPUs#27089
faaany wants to merge 2 commits intovllm-project:mainfrom
faaany:fix_ut

faaany commented Oct 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

faaany commented Oct 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 17, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 17, 2025

Uh oh!

jikunshang commented Oct 17, 2025

Uh oh!

faaany commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

faaany commented Oct 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test

Test Result on both CUDA and XPU

Uh oh!

faaany commented Oct 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang commented Oct 17, 2025

Uh oh!

faaany commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

faaany commented Oct 17, 2025 •

edited by github-actions bot

Loading