Skip to content

[Bugfix] Fix FP8 online quantization premature trigger with TP sharded weights#36621

Open
AjAnubolu wants to merge 1 commit intovllm-project:mainfrom
AjAnubolu:fix/fp8-tp-empty-output-36583
Open

[Bugfix] Fix FP8 online quantization premature trigger with TP sharded weights#36621
AjAnubolu wants to merge 1 commit intovllm-project:mainfrom
AjAnubolu:fix/fp8-tp-empty-output-36583

Conversation

@AjAnubolu
Copy link
Copy Markdown
Contributor

Use >= instead of == for loaded numel check to guard against edge cases in copy_ tracking with TP > 1.

Closes #36583

@mergify mergify bot added the bug Something isn't working label Mar 10, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in FP8 online quantization for models with tensor parallelism. The change modifies the condition for triggering the weight processing from an equality check to a greater-than-or-equal-to check. This is intended to make the logic more robust against edge cases with sharded weights. New regression tests have been added to cover this scenario and verify the behavior of the CopyNumelCounter.

@vkuzo
Copy link
Copy Markdown
Contributor

vkuzo commented Mar 12, 2026

@AjAnubolu could you share which models hit this edge case so we are aware?

note that #33814 should also take care of this

…d weights

Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
@AjAnubolu AjAnubolu force-pushed the fix/fp8-tp-empty-output-36583 branch from 4635dc9 to f3d6e91 Compare March 13, 2026 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Empty output when using FP8 + Tensor Parallel (2 GPUs) with Qwen3-8B

2 participants