Skip to content

[CI] Use float16 for test_bad_words to ensure compatibility with XPUs#27089

Closed
faaany wants to merge 2 commits intovllm-project:mainfrom
faaany:fix_ut
Closed

[CI] Use float16 for test_bad_words to ensure compatibility with XPUs#27089
faaany wants to merge 2 commits intovllm-project:mainfrom
faaany:fix_ut

Conversation

@faaany
Copy link
Contributor

@faaany faaany commented Oct 17, 2025

Recent updates to the UT model from PR #2405 have caused the test_bad_words unit test to fail in the XPU CI. The failure is not due to the bad_words feature itself, but rather due to cumulative precision issues introduced by the updated tiny model with BF16 when running on XPUs.

This PR updates the test_bad_words unit test to use the float16 data type for the model. The change ensures compatibility and numerical stability on Intel XPUs. Switching to float16 resolves the precision discrepancies observed with float32 on XPUs while preserving the correctness of the test.

The updated test has been validated across all supported platforms (CPU, GPU, and XPU) to ensure consistent behavior.

Test

pytest -rA tests/v1/sample/test_sampling_params_e2e.py

Test Result on both CUDA and XPU

================================================ short test summary info ================================================
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_n_gt_1
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_best_of
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_penalties
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_stop
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_stop_token_ids
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_detokenize_false
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_bad_words
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_logits_processor
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_allowed_token_ids
PASSED tests/v1/sample/test_sampling_params_e2e.py::test_seed
============================================ 10 passed, 2 warnings in 36.30s ============================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Fanli Lin <fanli.lin@intel.com>
@mergify mergify bot added the v1 label Oct 17, 2025
@faaany
Copy link
Contributor Author

faaany commented Oct 17, 2025

cc @jikunshang @yma11

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a failing test on XPU by changing the data type to float16. The proposed change modifies a shared test fixture, which affects all tests in the file, not just the failing one. While this resolves the immediate issue, it introduces a potential risk by reducing precision for all tests, which could lead to future maintenance and correctness issues. My review includes a suggestion to isolate this change to only the affected test for a safer and more maintainable solution.

@pytest.fixture(scope="module")
def llm() -> LLM:
return LLM(MODEL, enforce_eager=True)
return LLM(MODEL, enforce_eager=True, dtype="half")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change to dtype="half" affects all 10 tests in this file since they share the llm fixture. While this fixes test_bad_words on XPU, applying a precision-reducing change this broadly is risky. It could introduce subtle, hard-to-debug numerical issues in other tests now or in the future, especially on different hardware.

For better maintainability and to prevent unintended side effects, it's safer to isolate this change. I recommend creating a new fixture with dtype="half" and using it only for test_bad_words.

For example:

@pytest.fixture(scope="module")
def llm_half() -> LLM:
    return LLM(MODEL, enforce_eager=True, dtype="half")

# ... in test_bad_words:
def test_bad_words(llm_half):
    # ...

This would require updating test_bad_words to use the new fixture, but it makes the change's scope and purpose explicit and avoids impacting other tests.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.

Comment on lines 12 to +14
@pytest.fixture(scope="module")
def llm() -> LLM:
return LLM(MODEL, enforce_eager=True)
return LLM(MODEL, enforce_eager=True, dtype="half")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Forcing float16 breaks CPU architectures lacking fp16 support

The fixture now constructs the shared LLM with dtype="half" unconditionally. Previously the default auto path selected the first value from current_platform.supported_dtypes, which resolves to float32 on platforms that disable fp16 (e.g. PowerPC and RISC‑V per the logic in vllm/platforms/cpu.py). Hard‑coding half bypasses that guard and will attempt to load the model in float16 on those architectures, reintroducing the segmentation faults and ValueErrors the auto logic avoided. Consider selecting half only when the current platform reports fp16 as supported or limit the dtype change to the single test_bad_words case instead of the whole module.

Useful? React with 👍 / 👎.

@jikunshang
Copy link
Collaborator

maybe this due to kernel accuracy issue on bf16 + Intel GPU.

@faaany
Copy link
Contributor Author

faaany commented Oct 20, 2025

verified that the root-cause is ipex rms_norm kernel bf16 accuracy issue. close.

@faaany faaany closed this Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants