Skip to content

[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm#31820

Merged
robertgshaw2-redhat merged 3 commits intovllm-project:mainfrom
ROCm:akaratza_lang_pool
Jan 6, 2026
Merged

[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm#31820
robertgshaw2-redhat merged 3 commits intovllm-project:mainfrom
ROCm:akaratza_lang_pool

Conversation

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

@AndreasKaratzas AndreasKaratzas commented Jan 6, 2026

test_modernbert_models fails sometimes on ROCm due to numerical precision differences between vLLM's custom kernels and HuggingFace eager attention, with max diff ~0.03 exceeding the 0.01 threshold in only 2 floats.

Root Cause

ROCm's default matmul precision settings produce slightly different numerical results.

Testing

Ran test 100+ times in loop with cache clearing to recompile model from scratch.

@mergify mergify bot added the rocm Related to AMD ROCm label Jan 6, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround for numerical precision issues on ROCm that are causing test flakiness. The change involves modifying PyTorch's Scaled Dot-Product Attention (SDP) and matrix multiplication precision settings. While the fix is necessary, the current implementation using pytest_sessionstart alters global state without reverting it, which could unintentionally affect other tests in the suite. I have provided a suggestion to refactor this into a module-scoped autouse fixture. This is a safer, more idiomatic approach in pytest for managing test-specific setup and teardown, ensuring the changes are properly isolated.

Comment on lines +1 to +29
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""Pytest configuration for vLLM language generation tests."""

import warnings

import torch

from vllm.platforms import current_platform


def pytest_sessionstart(session):
"""Configure ROCm-specific settings before test session starts."""
if not current_platform.is_rocm():
return

# Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers
# accuracy issues: https://github.com/vllm-project/vllm/issues/30167
# TODO: Remove once ROCm SDP accuracy issues are resolved on HuggingFace
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_math_sdp(True)
torch.set_float32_matmul_precision("highest")
warnings.warn(
"ROCm: Disabled flash_sdp and mem_efficient_sdp, enabled math_sdp "
"to avoid HuggingFace Transformers accuracy issues",
UserWarning,
stacklevel=1,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using pytest_sessionstart to modify global state like torch settings can have unintended side effects on other tests that run in the same session, as these settings are not reverted. This can lead to slower execution or unexpected behavior in unrelated tests.

A more robust and idiomatic pytest approach is to use a fixture with autouse=True and an appropriate scope (e.g., module). This ensures that the settings are applied only for the relevant tests and, crucially, that the original settings are restored after the tests in the module have completed, preventing any impact on other parts of the test suite.

I've suggested a refactoring to use a module-scoped autouse fixture which encapsulates the setup and teardown logic cleanly.

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""Pytest configuration for vLLM language pooling tests."""

import warnings

import pytest
import torch

from vllm.platforms import current_platform


@pytest.fixture(scope="module", autouse=True)
def rocm_precision_workaround():
    """Workaround for numerical precision issues on ROCm for pooling tests."""
    if not current_platform.is_rocm():
        yield
        return

    # Save original settings
    orig_flash = torch.backends.cuda.flash_sdp_enabled()
    orig_mem_eff = torch.backends.cuda.mem_efficient_sdp_enabled()
    orig_math = torch.backends.cuda.math_sdp_enabled()
    orig_matmul_precision = torch.get_float32_matmul_precision()

    try:
        # Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers
        # accuracy issues: https://github.com/vllm-project/vllm/issues/30167
        # TODO: Remove once ROCm SDP accuracy issues are resolved on HuggingFace
        torch.backends.cuda.enable_flash_sdp(False)
        torch.backends.cuda.enable_mem_efficient_sdp(False)
        torch.backends.cuda.enable_math_sdp(True)
        torch.set_float32_matmul_precision("highest")
        warnings.warn(
            "ROCm: Disabled flash_sdp and mem_efficient_sdp, enabled math_sdp "
            "to avoid HuggingFace Transformers accuracy issues for pooling tests.",
            UserWarning,
            stacklevel=2,
        )
        yield
    finally:
        # Restore original settings
        torch.backends.cuda.enable_flash_sdp(orig_flash)
        torch.backends.cuda.enable_mem_efficient_sdp(orig_mem_eff)
        torch.backends.cuda.enable_math_sdp(orig_math)
        torch.set_float32_matmul_precision(orig_matmul_precision)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a bit too overengineered, and might even not be functional. The problem is in one specific test inside the Language Models Test (Extended Pooling) group.

@DarkLight1337
Copy link
Copy Markdown
Member

Just to be sure, this is still needed after #31776?

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

Just to be sure, this is still needed after #31776?

@DarkLight1337 Thank you for pointing this PR out to me. I was not aware of it. I'm probably going to close this PR. I'm going to check the recent changes and close it. I already see that Flex Attention has been completely removed from rocm attn dispatch mechanism.

… fp acc

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

@DarkLight1337 there was still a small error on ROCm:

          torch.testing.assert_close(hf_output, vllm_output, atol=1.2e-2, rtol=1e-3)
E           AssertionError: Tensor-likes are not close!
E
E           Mismatched elements: 1 / 323 (0.3%)
E           Greatest absolute difference: 0.0330466628074646 at index (13, 0) (up to 0.012 allowed)
E           Greatest relative difference: 0.09034454077482224 at index (13, 0) (up to 0.001 allowed)
tests/models/language/pooling/test_token_classification.py:81: AssertionError

This PR addresses it.

@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) January 6, 2026 22:23
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026
@robertgshaw2-redhat robertgshaw2-redhat merged commit 2a42ae7 into vllm-project:main Jan 6, 2026
20 checks passed
@AndreasKaratzas AndreasKaratzas deleted the akaratza_lang_pool branch January 6, 2026 23:29
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
…y on ROCm (vllm-project#31820)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
…y on ROCm (vllm-project#31820)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…y on ROCm (vllm-project#31820)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…y on ROCm (vllm-project#31820)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants