Skip to content

Fix gptoss 4bit#524

Merged
danielhanchen merged 10 commits into
unslothai:mainfrom
Datta0:fix_gptoss_4bit
Feb 26, 2026
Merged

Fix gptoss 4bit#524
danielhanchen merged 10 commits into
unslothai:mainfrom
Datta0:fix_gptoss_4bit

Conversation

@Datta0

@Datta0 Datta0 commented Feb 26, 2026

Copy link
Copy Markdown
Collaborator

I accidentally added some code to convert nn.Parameter to 4bit support. That seems to have caused issues with GPT OSS 4bit on T4. Undo that code :)

Source of the issue: #495

Note: I merged #519 to resolve the path issue for testing

@Datta0 Datta0 changed the base branch from main to fix/gpt-oss-parametermodule-compiler-export February 26, 2026 16:52
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Datta0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily addresses and resolves issues encountered with GPT OSS 4-bit models on T4 GPUs by reverting previously introduced code that aimed to convert nn.Parameter to 4-bit support. The changes streamline the compilation process by removing problematic temporary patches related to 4-bit parameter handling in PEFT and Hugging Face Transformers, ensuring more stable and correct behavior for quantized models. Additionally, it refines decorator stripping in the compiler and improves model name parsing for better consistency.

Highlights

  • Reverted 4-bit Parameter Conversion: Undid previous changes that introduced explicit conversion of nn.Parameter to 4-bit support, which was causing issues with GPT OSS 4-bit models on T4 GPUs.
  • Removed PEFT and Transformers 4-bit MoE Patches: Eliminated several temporary patches related to PEFT's MoE 4-bit parameter wrapper, BitsAndBytes 4-bit quantization for MoE parameters, and dequantization logic within transformers integrations.
  • Refined Compiler Decorator Stripping: Expanded the list of decorators that the compiler strips from source code, including use_kernel_forward_from_hub, use_kernelized_func, and auto_docstring, and added a new logger.info statement for MoE imports.
  • Improved GPT OSS Model Name Handling: Introduced a new helper function _normalized_unsloth_model_name to standardize the retrieval of the model name from environment variables, and updated various GPT OSS-related checks to use this helper.
  • Enhanced Attention Mask Retrieval: Modified the return_attention_mask wrapper to more robustly identify input_embeds and expanded the accepted data types for attention masks to include torch.int64 and torch.bool.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • unsloth_zoo/compiler.py
    • Added a logging statement when MoE imports are added.
    • Expanded the set of decorators to be stripped during compilation to include use_kernel_forward_from_hub, use_kernelized_func, and auto_docstring.
    • Modified the conditional logic for patching forward methods, now only applying if @torch.compiler.disable is present.
    • Introduced a new regex substitution to remove @use_kernelized_func from source code.
  • unsloth_zoo/temporary_patches/gpt_oss.py
    • Added patch_gpt_oss_compiler_exports to export necessary helper symbols for compiler-generated GPT-OSS modules.
    • Created a new helper function _normalized_unsloth_model_name to consistently process the UNSLOTH_MODEL_NAME environment variable.
    • Updated multiple functions (_should_use_gpt_oss_bnb4bit, _is_gpt_oss_4bit_load, patch_gpt_oss_moe_for_lora, patch_gpt_oss_linearized, patch_GptOssAttention, patch_GptOssModel, patch_gpt_oss_init_weights_modulelist_fix, patch_gpt_oss_for_grpo) to utilize _normalized_unsloth_model_name for model name checks.
    • Improved the return_attention_mask wrapper to search for input_embeds more comprehensively and to accept torch.int64 and torch.bool as valid attention mask dtypes.
  • unsloth_zoo/temporary_patches/misc.py
    • Removed patch_peft_moe_4bit_paramwrapper_and_injection, which handled PEFT's MoE 4-bit parameter injection and ParamWrapper behavior.
    • Removed patch_transformers_bnb4bit_moe_param_quantization, which patched BitsAndBytes quantization for MoE parameters.
    • Removed patch_transformers_moe_bnb4bit_dequantization, which patched dequantization of Params4bit expert tensors.
  • unsloth_zoo/temporary_patches/moe_utils.py
    • Removed the auto-dequantization logic for BitsAndBytes 4-bit packed MoE parameters from _get_base_weight.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts code related to 4-bit support for Mixture-of-Experts (MoE) models, which was causing issues. The changes primarily involve deleting the problematic patches from unsloth_zoo/temporary_patches/misc.py and related dequantization logic from unsloth_zoo/temporary_patches/moe_utils.py. Additionally, there are minor improvements in unsloth_zoo/compiler.py, including a new log message and a more specific condition for detecting patched forward functions. The changes are consistent with the goal of fixing the described issue, and I have not found any problems with this pull request.

@Datta0 Datta0 changed the base branch from fix/gpt-oss-parametermodule-compiler-export to main February 26, 2026 16:55
@danielhanchen danielhanchen merged commit 7e1356c into unslothai:main Feb 26, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12a575ae4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/compiler.py
if orig_fwd:
patched_forward_info = (func_match.group(1), orig_fwd.group(1))
disable = None # Keep patched source as-is for renamed forward replacements
if "@torch.compiler.disable" in forward_source:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Detect renamed forward patches without disable decorator

Limiting renamed-forward detection to sources containing @torch.compiler.disable skips valid patched forwards that are renamed but undecorated (for example patch_function(DeepseekV3MoE, "forward", patched_moe_forward) in temporary_patches/deepseek_v3_moe.py). When this branch is skipped, create_standalone_class no longer swaps the class’s original forward with the patched implementation, so compiled modules silently fall back to stale/original forward logic and lose the runtime patch behavior.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants