[Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff by Isotr0py · Pull Request #31776 · vllm-project/vllm

Isotr0py · 2026-01-06T05:56:57Z

Purpose

Fix [v1] Add encoder-only/cross attention support to Triton Attention backend #31406 (comment)
Triton kernel has minor accuracy (~0.001) error comparing previous FlexAttention backend
Increase rtol to fix failing pooling test

Test Plan

pytest -s -v tests/models/language/pooling/test_token_classification.py::test_modernbert_models[float-disham993/electrical-ner-ModernBERT-base]

Test Result

Can't reproduce locally, hope this make CI green. 🙏

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

gemini-code-assist

Code Review

This pull request addresses a failing test for pooling models by increasing the numerical tolerance in an assertion. While this is a pragmatic fix for the immediate CI failure, I have raised a high-severity concern. The relaxed tolerance, especially when combined with the existing absolute tolerance, could potentially mask future regressions in the model's output. My review comment suggests adding explanatory comments to the code and exploring whether the tolerances can be defined more precisely to ensure the test remains as strict as possible while accounting for expected numerical differences. This is crucial for maintaining the integrity and reliability of the test suite.

gemini-code-assist · 2026-01-06T05:57:57Z

tests/models/language/pooling/test_token_classification.py

        hf_output = hf_output.detach().clone().cpu().float()
        vllm_output = vllm_output.detach().clone().cpu().float()
-        assert torch.allclose(hf_output, vllm_output, atol=1e-2)
+        assert torch.allclose(hf_output, vllm_output, atol=1e-2, rtol=1e-3)


While this change fixes the immediate CI failure, relaxing test tolerances, especially on top of an existing atol=1e-2, increases the risk of masking future regressions. To ensure the test remains as strict as possible while accounting for the numerical noise from the Triton kernel, please consider the following:

Add a brief code comment explaining why this specific test for ModernBERT requires this rtol, unlike the other tests in this file. This provides vital context for future maintenance.

If the discrepancy is primarily relative, could the absolute tolerance atol be tightened? A more precise test might use something like atol=1e-5, rtol=1e-3, which would be stricter for small-magnitude outputs.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…el accuracy diff (vllm-project#31776) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…ton kernel accuracy diff (vllm-project#31776)" This reverts commit ee2e69d.

…el accuracy diff (vllm-project#31776) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…el accuracy diff (vllm-project#31776) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…el accuracy diff (vllm-project#31776) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

increase rtol

d415754

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py requested a review from noooop as a code owner January 6, 2026 05:56

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

use torch testing assertion

36f34c6

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026

Isotr0py changed the title ~~[Bugfix][CI/Build] Fix failing pooling models test due to Trion kernel accuracy diff~~ [Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff Jan 6, 2026

noooop approved these changes Jan 6, 2026

View reviewed changes

noooop enabled auto-merge (squash) January 6, 2026 06:17

increase atol

7fbaebc

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

noooop merged commit ee2e69d into vllm-project:main Jan 6, 2026
19 checks passed

Isotr0py deleted the fix-pooling-test branch January 6, 2026 11:53

DarkLight1337 mentioned this pull request Jan 6, 2026

[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm #31820

Merged

LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026

[Bugfix][CI/Build] Fix failing pooling models test due to Triton kern…

724d094

…el accuracy diff (vllm-project#31776) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[Bugfix][CI/Build] Fix failing pooling models test due to Triton kern…

28bb9dc

…el accuracy diff (vllm-project#31776) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

MitchLewis930 added a commit to Signal65/vllm-code-review that referenced this pull request Jan 14, 2026

Revert "[Bugfix][CI/Build] Fix failing pooling models test due to Tri…

d681af3

…ton kernel accuracy diff (vllm-project#31776)" This reverts commit ee2e69d.

MitchLewis930 mentioned this pull request Jan 14, 2026

PR_010 - CodeRabbit Signal65/vllm-code-review#12

Closed

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[Bugfix][CI/Build] Fix failing pooling models test due to Triton kern…

54fc630

…el accuracy diff (vllm-project#31776) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

MitchLewis930 mentioned this pull request Jan 21, 2026

Revert "[Bugfix][CI/Build] Fix failing pooling models test due to Tri… Test Signal65/vllm-code-review#24

Open

5 tasks

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Bugfix][CI/Build] Fix failing pooling models test due to Triton kern…

9d1f8e6

…el accuracy diff (vllm-project#31776) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff#31776

[Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff#31776
noooop merged 3 commits intovllm-project:mainfrom
Isotr0py:fix-pooling-test

Isotr0py commented Jan 6, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Isotr0py commented Jan 6, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Isotr0py commented Jan 6, 2026 •

edited by github-actions bot

Loading