[CI] Use float16 for test_bad_words to ensure compatibility with XPUs#27089
[CI] Use float16 for test_bad_words to ensure compatibility with XPUs#27089faaany wants to merge 2 commits intovllm-project:mainfrom
float16 for test_bad_words to ensure compatibility with XPUs#27089Conversation
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
There was a problem hiding this comment.
Code Review
This pull request aims to fix a failing test on XPU by changing the data type to float16. The proposed change modifies a shared test fixture, which affects all tests in the file, not just the failing one. While this resolves the immediate issue, it introduces a potential risk by reducing precision for all tests, which could lead to future maintenance and correctness issues. My review includes a suggestion to isolate this change to only the affected test for a safer and more maintainable solution.
| @pytest.fixture(scope="module") | ||
| def llm() -> LLM: | ||
| return LLM(MODEL, enforce_eager=True) | ||
| return LLM(MODEL, enforce_eager=True, dtype="half") |
There was a problem hiding this comment.
This change to dtype="half" affects all 10 tests in this file since they share the llm fixture. While this fixes test_bad_words on XPU, applying a precision-reducing change this broadly is risky. It could introduce subtle, hard-to-debug numerical issues in other tests now or in the future, especially on different hardware.
For better maintainability and to prevent unintended side effects, it's safer to isolate this change. I recommend creating a new fixture with dtype="half" and using it only for test_bad_words.
For example:
@pytest.fixture(scope="module")
def llm_half() -> LLM:
return LLM(MODEL, enforce_eager=True, dtype="half")
# ... in test_bad_words:
def test_bad_words(llm_half):
# ...This would require updating test_bad_words to use the new fixture, but it makes the change's scope and purpose explicit and avoids impacting other tests.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.
| @pytest.fixture(scope="module") | ||
| def llm() -> LLM: | ||
| return LLM(MODEL, enforce_eager=True) | ||
| return LLM(MODEL, enforce_eager=True, dtype="half") |
There was a problem hiding this comment.
Forcing float16 breaks CPU architectures lacking fp16 support
The fixture now constructs the shared LLM with dtype="half" unconditionally. Previously the default auto path selected the first value from current_platform.supported_dtypes, which resolves to float32 on platforms that disable fp16 (e.g. PowerPC and RISC‑V per the logic in vllm/platforms/cpu.py). Hard‑coding half bypasses that guard and will attempt to load the model in float16 on those architectures, reintroducing the segmentation faults and ValueErrors the auto logic avoided. Consider selecting half only when the current platform reports fp16 as supported or limit the dtype change to the single test_bad_words case instead of the whole module.
Useful? React with 👍 / 👎.
|
maybe this due to kernel accuracy issue on bf16 + Intel GPU. |
|
verified that the root-cause is ipex rms_norm kernel bf16 accuracy issue. close. |
Recent updates to the UT model from PR #2405 have caused the
test_bad_wordsunit test to fail in the XPU CI. The failure is not due to the bad_words feature itself, but rather due to cumulative precision issues introduced by the updated tiny model with BF16 when running on XPUs.This PR updates the
test_bad_wordsunit test to use the float16 data type for the model. The change ensures compatibility and numerical stability on Intel XPUs. Switching to float16 resolves the precision discrepancies observed with float32 on XPUs while preserving the correctness of the test.The updated test has been validated across all supported platforms (CPU, GPU, and XPU) to ensure consistent behavior.
Test
Test Result on both CUDA and XPU
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.