Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion tests/v1/sample/test_sampling_params_e2e.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

@pytest.fixture(scope="module")
def llm() -> LLM:
return LLM(MODEL, enforce_eager=True)
return LLM(MODEL, enforce_eager=True, dtype="half")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change to dtype="half" affects all 10 tests in this file since they share the llm fixture. While this fixes test_bad_words on XPU, applying a precision-reducing change this broadly is risky. It could introduce subtle, hard-to-debug numerical issues in other tests now or in the future, especially on different hardware.

For better maintainability and to prevent unintended side effects, it's safer to isolate this change. I recommend creating a new fixture with dtype="half" and using it only for test_bad_words.

For example:

@pytest.fixture(scope="module")
def llm_half() -> LLM:
    return LLM(MODEL, enforce_eager=True, dtype="half")

# ... in test_bad_words:
def test_bad_words(llm_half):
    # ...

This would require updating test_bad_words to use the new fixture, but it makes the change's scope and purpose explicit and avoids impacting other tests.

Comment on lines 12 to +14
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Forcing float16 breaks CPU architectures lacking fp16 support

The fixture now constructs the shared LLM with dtype="half" unconditionally. Previously the default auto path selected the first value from current_platform.supported_dtypes, which resolves to float32 on platforms that disable fp16 (e.g. PowerPC and RISC‑V per the logic in vllm/platforms/cpu.py). Hard‑coding half bypasses that guard and will attempt to load the model in float16 on those architectures, reintroducing the segmentation faults and ValueErrors the auto logic avoided. Consider selecting half only when the current platform reports fp16 as supported or limit the dtype change to the single test_bad_words case instead of the whole module.

Useful? React with 👍 / 👎.



def test_n_gt_1(llm):
Expand Down
Loading