Skip to content

[BugFix] fix gptq_marlin_gemm has no parameter called b_bias#13571

Merged
AniZpZ merged 14 commits intosgl-project:mainfrom
ehuaa:fix-marlin-fp8
Dec 17, 2025
Merged

[BugFix] fix gptq_marlin_gemm has no parameter called b_bias#13571
AniZpZ merged 14 commits intosgl-project:mainfrom
ehuaa:fix-marlin-fp8

Conversation

@ehuaa
Copy link
Contributor

@ehuaa ehuaa commented Nov 19, 2025

Motivation

Need to be merged before this pr #13524, when you use sglang version of fp8 marlin linear kernel, there's a bug to fix here.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ehuaa, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in the FP8 Marlin linear kernel where the gptq_marlin_gemm function was incorrectly invoked with a b_bias parameter it did not support. The change modifies the implementation to remove the unsupported parameter from the gptq_marlin_gemm call and instead applies the bias as a separate addition to the output tensor after the GEMM operation. This ensures the correct handling of bias in the FP8 Marlin linear kernel, improving the accuracy and robustness of the calculations.

Highlights

  • Bug Fix: Corrected an issue in the FP8 Marlin linear kernel where the gptq_marlin_gemm function was being called with an unsupported b_bias parameter.
  • Bias Application Logic: Refactored the bias application to occur explicitly after the gptq_marlin_gemm operation, ensuring that bias is correctly added to the output when present.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in apply_fp8_marlin_linear where the gptq_marlin_gemm kernel was called with an unsupported b_bias parameter. The fix correctly removes this parameter from the kernel invocation and applies the bias to the output tensor in a separate step. This change aligns the function's behavior with other similar linear layers in the codebase that use the Marlin kernel. The fix is correct and well-implemented.

@AniZpZ AniZpZ added quant LLM Quantization run-ci labels Nov 20, 2025
@AniZpZ AniZpZ added the ready-to-merge The PR is ready to merge after the CI is green. label Nov 21, 2025
@AniZpZ AniZpZ enabled auto-merge (squash) December 9, 2025 03:41
@ehuaa
Copy link
Contributor Author

ehuaa commented Dec 12, 2025

Hi @AniZpZ @FlamingoPg , if this pr can be merged now? The error in the ci above may not related to this pr

@AniZpZ
Copy link
Collaborator

AniZpZ commented Dec 12, 2025

Hi @AniZpZ @FlamingoPg , if this pr can be merged now? The error in the ci above may not related to this pr

we can only merge the pr if the required ci passed

@ehuaa
Copy link
Contributor Author

ehuaa commented Dec 14, 2025

Hi @AniZpZ @FlamingoPg , if this pr can be merged now? The error in the ci above may not related to this pr

we can only merge the pr if the required ci passed

@AniZpZ Thanks, I got it. I merged the main branch into this branch again. Let's wait and see if the CI passes all tests this time~

@AniZpZ AniZpZ disabled auto-merge December 15, 2025 08:37
@AniZpZ
Copy link
Collaborator

AniZpZ commented Dec 16, 2025

Hi @AniZpZ @FlamingoPg , if this pr can be merged now? The error in the ci above may not related to this pr

we can only merge the pr if the required ci passed

@AniZpZ Thanks, I got it. I merged the main branch into this branch again. Let's wait and see if the CI passes all tests this time~

i will take over this pr and make it merged

@AniZpZ AniZpZ merged commit 03f9eb2 into sgl-project:main Dec 17, 2025
145 of 151 checks passed
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 17, 2025
…ject#13571)

Co-authored-by: Peng Zhang <aniz1905@gmail.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
…ject#13571)

Co-authored-by: Peng Zhang <aniz1905@gmail.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
…ject#13571)

Co-authored-by: Peng Zhang <aniz1905@gmail.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quant LLM Quantization ready-to-merge The PR is ready to merge after the CI is green. run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants