[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm by AndreasKaratzas · Pull Request #31820 · vllm-project/vllm

AndreasKaratzas · 2026-01-06T16:52:48Z

test_modernbert_models fails sometimes on ROCm due to numerical precision differences between vLLM's custom kernels and HuggingFace eager attention, with max diff ~0.03 exceeding the 0.01 threshold in only 2 floats.

Root Cause

ROCm's default matmul precision settings produce slightly different numerical results.

Testing

Ran test 100+ times in loop with cache clearing to recompile model from scratch.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request introduces a workaround for numerical precision issues on ROCm that are causing test flakiness. The change involves modifying PyTorch's Scaled Dot-Product Attention (SDP) and matrix multiplication precision settings. While the fix is necessary, the current implementation using pytest_sessionstart alters global state without reverting it, which could unintentionally affect other tests in the suite. I have provided a suggestion to refactor this into a module-scoped autouse fixture. This is a safer, more idiomatic approach in pytest for managing test-specific setup and teardown, ensuring the changes are properly isolated.

gemini-code-assist · 2026-01-06T16:54:27Z

tests/models/language/pooling/conftest.py

+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
+"""Pytest configuration for vLLM language generation tests."""
+
+import warnings
+
+import torch
+
+from vllm.platforms import current_platform
+
+
+def pytest_sessionstart(session):
+    """Configure ROCm-specific settings before test session starts."""
+    if not current_platform.is_rocm():
+        return
+
+    # Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers
+    # accuracy issues: https://github.com/vllm-project/vllm/issues/30167
+    # TODO: Remove once ROCm SDP accuracy issues are resolved on HuggingFace
+    torch.backends.cuda.enable_flash_sdp(False)
+    torch.backends.cuda.enable_mem_efficient_sdp(False)
+    torch.backends.cuda.enable_math_sdp(True)
+    torch.set_float32_matmul_precision("highest")
+    warnings.warn(
+        "ROCm: Disabled flash_sdp and mem_efficient_sdp, enabled math_sdp "
+        "to avoid HuggingFace Transformers accuracy issues",
+        UserWarning,
+        stacklevel=1,
+    )


Using pytest_sessionstart to modify global state like torch settings can have unintended side effects on other tests that run in the same session, as these settings are not reverted. This can lead to slower execution or unexpected behavior in unrelated tests.

A more robust and idiomatic pytest approach is to use a fixture with autouse=True and an appropriate scope (e.g., module). This ensures that the settings are applied only for the relevant tests and, crucially, that the original settings are restored after the tests in the module have completed, preventing any impact on other parts of the test suite.

I've suggested a refactoring to use a module-scoped autouse fixture which encapsulates the setup and teardown logic cleanly.

# SPDX-License-Identifier: Apache-2.0 # SPDX-FileCopyrightText: Copyright contributors to the vLLM project """Pytest configuration for vLLM language pooling tests.""" import warnings import pytest import torch from vllm.platforms import current_platform @pytest.fixture(scope="module", autouse=True) def rocm_precision_workaround(): """Workaround for numerical precision issues on ROCm for pooling tests.""" if not current_platform.is_rocm(): yield return # Save original settings orig_flash = torch.backends.cuda.flash_sdp_enabled() orig_mem_eff = torch.backends.cuda.mem_efficient_sdp_enabled() orig_math = torch.backends.cuda.math_sdp_enabled() orig_matmul_precision = torch.get_float32_matmul_precision() try: # Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers # accuracy issues: https://github.com/vllm-project/vllm/issues/30167 # TODO: Remove once ROCm SDP accuracy issues are resolved on HuggingFace torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_mem_efficient_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.set_float32_matmul_precision("highest") warnings.warn( "ROCm: Disabled flash_sdp and mem_efficient_sdp, enabled math_sdp " "to avoid HuggingFace Transformers accuracy issues for pooling tests.", UserWarning, stacklevel=2, ) yield finally: # Restore original settings torch.backends.cuda.enable_flash_sdp(orig_flash) torch.backends.cuda.enable_mem_efficient_sdp(orig_mem_eff) torch.backends.cuda.enable_math_sdp(orig_math) torch.set_float32_matmul_precision(orig_matmul_precision)

That is a bit too overengineered, and might even not be functional. The problem is in one specific test inside the Language Models Test (Extended Pooling) group.

DarkLight1337 · 2026-01-06T17:06:33Z

Just to be sure, this is still needed after #31776?

AndreasKaratzas · 2026-01-06T17:18:21Z

Just to be sure, this is still needed after #31776?

@DarkLight1337 Thank you for pointing this PR out to me. I was not aware of it. I'm probably going to close this PR. I'm going to check the recent changes and close it. I already see that Flex Attention has been completely removed from rocm attn dispatch mechanism.

… fp acc Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-01-06T20:11:37Z

@DarkLight1337 there was still a small error on ROCm:

          torch.testing.assert_close(hf_output, vllm_output, atol=1.2e-2, rtol=1e-3)
E           AssertionError: Tensor-likes are not close!
E
E           Mismatched elements: 1 / 323 (0.3%)
E           Greatest absolute difference: 0.0330466628074646 at index (13, 0) (up to 0.012 allowed)
E           Greatest relative difference: 0.09034454077482224 at index (13, 0) (up to 0.001 allowed)
tests/models/language/pooling/test_token_classification.py:81: AssertionError

This PR addresses it.

…y on ROCm (vllm-project#31820) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…y on ROCm (vllm-project#31820) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…y on ROCm (vllm-project#31820) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas added 2 commits January 6, 2026 16:47

Enforce high float32 matmul precision for ROCm in pooling tests

f08f1af

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_lang_pool

c20c13c

AndreasKaratzas requested a review from noooop as a code owner January 6, 2026 16:52

mergify bot added the rocm Related to AMD ROCm label Jan 6, 2026

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

Migrating from flex attention to triton backend - no need for highest…

49ec054

… fp acc Signed-off-by: Andreas Karatzas <akaratza@amd.com>

robertgshaw2-redhat approved these changes Jan 6, 2026

View reviewed changes

robertgshaw2-redhat enabled auto-merge (squash) January 6, 2026 22:23

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026

robertgshaw2-redhat merged commit 2a42ae7 into vllm-project:main Jan 6, 2026
20 checks passed

AndreasKaratzas deleted the akaratza_lang_pool branch January 6, 2026 23:29

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[ROCm][CI] Fix ModernBERT token classification test numerical accurac…

af0ed06

…y on ROCm (vllm-project#31820) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[ROCm][CI] Fix ModernBERT token classification test numerical accurac…

a17ac0f

…y on ROCm (vllm-project#31820) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[ROCm][CI] Fix ModernBERT token classification test numerical accurac…

c6a5898

…y on ROCm (vllm-project#31820) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm#31820

[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm#31820
robertgshaw2-redhat merged 3 commits intovllm-project:mainfrom
ROCm:akaratza_lang_pool

AndreasKaratzas commented Jan 6, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

AndreasKaratzas Jan 6, 2026

Uh oh!

DarkLight1337 commented Jan 6, 2026

Uh oh!

AndreasKaratzas commented Jan 6, 2026

Uh oh!

AndreasKaratzas commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

AndreasKaratzas commented Jan 6, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Jan 6, 2026

Uh oh!

AndreasKaratzas commented Jan 6, 2026

Uh oh!

AndreasKaratzas commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AndreasKaratzas commented Jan 6, 2026 •

edited by github-actions bot

Loading