Skip to content

PR_010 - CodeRabbit#12

Closed
MitchLewis930 wants to merge 1 commit intobench/PR_010_basefrom
bench/PR_010_bug__CodeRabbit
Closed

PR_010 - CodeRabbit#12
MitchLewis930 wants to merge 1 commit intobench/PR_010_basefrom
bench/PR_010_bug__CodeRabbit

Conversation

@MitchLewis930
Copy link
Copy Markdown
Collaborator

@MitchLewis930 MitchLewis930 commented Jan 14, 2026

Context
This pull request reproduces a historical code change from the vLLM repository
for the purpose of evaluating automated code review tools.

References
Upstream fix PR: vllm-project#31776

Notes
This PR is not intended to be merged

Summary by CodeRabbit

  • Tests
    • Updated test validation logic with adjusted tolerance thresholds for model output comparisons.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 14, 2026

Warning

Rate limit exceeded

@MitchLewis930 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 11 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between ee2e69d and d681af3.

📒 Files selected for processing (1)
  • tests/models/language/pooling/test_token_classification.py
📝 Walkthrough

Walkthrough

A test assertion method was changed from torch.testing.assert_close to torch.allclose in token classification tests, with the absolute tolerance adjusted from 1.2e-2 to 1e-2 and the relative tolerance parameter removed.

Changes

Cohort / File(s) Summary
Test Assertion Tolerance Update
tests/models/language/pooling/test_token_classification.py
Modified comparison logic in test_modernbert_models: replaced torch.testing.assert_close(..., atol=1.2e-2, rtol=1e-3) with torch.allclose(..., atol=1e-2), removing relative tolerance and slightly tightening absolute tolerance

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 A hop and a skip through assertions we go,
Tolerances tweaked to let tests better flow,
From close to allclose, the barriers align,
1e-2 precision—now that's divine!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'PR_010' is vague and generic, providing no meaningful information about the changeset. Use a descriptive title that reflects the actual change, such as 'Revert pooling models test tolerance adjustments' or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@MitchLewis930
Copy link
Copy Markdown
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@MitchLewis930 MitchLewis930 changed the title PR_010 PR_010 - CodeRabbit Jan 14, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/models/language/pooling/test_token_classification.py`:
- Line 81: Replace the muted bare assertion using torch.allclose with
torch.testing.assert_close to restore detailed diagnostics and the original
tolerances; specifically, in the test assertion that compares hf_output and
vllm_output (the line currently using torch.allclose), call
torch.testing.assert_close(hf_output, vllm_output, atol=1.2e-2, rtol=1e-3) so
the test uses the intended looser absolute and relative tolerances and emits
helpful element-level difference messages on failure.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee2e69d and d681af3.

📒 Files selected for processing (1)
  • tests/models/language/pooling/test_token_classification.py

hf_output = hf_output.detach().clone().cpu().float()
vllm_output = vllm_output.detach().clone().cpu().float()
torch.testing.assert_close(hf_output, vllm_output, atol=1.2e-2, rtol=1e-3)
assert torch.allclose(hf_output, vllm_output, atol=1e-2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reverting from torch.testing.assert_close to torch.allclose degrades test quality and may cause flakiness.

This change introduces two issues:

  1. Loss of diagnostics: torch.testing.assert_close provides detailed error messages showing which elements differ and by how much. Using assert torch.allclose(...) only raises a bare AssertionError with no useful debugging information when the test fails.

  2. Stricter tolerance: The original used atol=1.2e-2, rtol=1e-3. The new code uses atol=1e-2 with default rtol=1e-5 — this is stricter on both absolute (20% reduction) and relative (100× stricter) tolerances. Given this test exists specifically to handle Triton kernel numerical differences (per the upstream PR #31776 this reverts), tighter tolerances risk intermittent test failures.

Suggested fix
-        assert torch.allclose(hf_output, vllm_output, atol=1e-2)
+        torch.testing.assert_close(hf_output, vllm_output, atol=1.2e-2, rtol=1e-3)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert torch.allclose(hf_output, vllm_output, atol=1e-2)
torch.testing.assert_close(hf_output, vllm_output, atol=1.2e-2, rtol=1e-3)
🤖 Prompt for AI Agents
In `@tests/models/language/pooling/test_token_classification.py` at line 81,
Replace the muted bare assertion using torch.allclose with
torch.testing.assert_close to restore detailed diagnostics and the original
tolerances; specifically, in the test assertion that compares hf_output and
vllm_output (the line currently using torch.allclose), call
torch.testing.assert_close(hf_output, vllm_output, atol=1.2e-2, rtol=1e-3) so
the test uses the intended looser absolute and relative tolerances and emits
helpful element-level difference messages on failure.

@MitchLewis930
Copy link
Copy Markdown
Collaborator Author

@CodeRabbit full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 15, 2026

✅ Actions performed

Full review triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant