[Bugfix] Fix FP8 Bias Loading#41424
Merged
Isotr0py merged 4 commits intovllm-project:mainfrom May 3, 2026
Merged
Conversation
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the materialize_layer function to ensure that only meta tensors are materialized, preventing the overwriting of already initialized non-meta tensors. A new test case, test_materialize_layer_preserves_non_meta_tensors, has been added to verify this logic. I have no feedback to provide.
Isotr0py
approved these changes
May 2, 2026
joa-stdn
pushed a commit
to joa-stdn/vllm
that referenced
this pull request
May 4, 2026
Signed-off-by: Alex Brooks <albrooks@redhat.com> Signed-off-by: Joachim Studnia <joachim@mistral.ai>
chaojun-zhang
pushed a commit
to chaojun-zhang/vllm
that referenced
this pull request
May 6, 2026
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Copilot AI
pushed a commit
to hongbolv/vllm
that referenced
this pull request
May 7, 2026
Signed-off-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
ikaadil
pushed a commit
to ikaadil/vllm
that referenced
this pull request
May 7, 2026
Signed-off-by: Alex Brooks <albrooks@redhat.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Fixes the underlying cause of #41284
The issue is that when layers have
bias=True, we do the following:materializethe weight meta tensors, which means replacing it with a new on device tensor withtorch.empty_stridedThe materialization currently does this to everything, including the bias, even though it should only do it to weights. This corrupts the bias values, which creates NaNs in forward() and ultimately produces garbage values.
The handling for NaNs is also why things worked for granite speech in 0.17 with fp8, but not in 0.20. I think the native forward doesn't handle NaNs in the same way, which is why the values diverge, will open a separate PR to discuss.
Test Plan
Added an explicit test - you can also verify the fix with a minimal fp8 example with granite speech.
Test Result
On main:
After fix:
CC @DarkLight1337 @robertgshaw2-redhat @lokashrinav