Fix gptoss 4bit by Datta0 · Pull Request #524 · unslothai/unsloth-zoo

Datta0 · 2026-02-26T16:52:25Z

I accidentally added some code to convert nn.Parameter to 4bit support. That seems to have caused issues with GPT OSS 4bit on T4. Undo that code :)

Source of the issue: #495

Note: I merged #519 to resolve the path issue for testing

…iler-export' into fix_gptoss_4bit

gemini-code-assist · 2026-02-26T16:52:46Z

Summary of Changes

Hello @Datta0, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily addresses and resolves issues encountered with GPT OSS 4-bit models on T4 GPUs by reverting previously introduced code that aimed to convert nn.Parameter to 4-bit support. The changes streamline the compilation process by removing problematic temporary patches related to 4-bit parameter handling in PEFT and Hugging Face Transformers, ensuring more stable and correct behavior for quantized models. Additionally, it refines decorator stripping in the compiler and improves model name parsing for better consistency.

Highlights

Reverted 4-bit Parameter Conversion: Undid previous changes that introduced explicit conversion of nn.Parameter to 4-bit support, which was causing issues with GPT OSS 4-bit models on T4 GPUs.
Removed PEFT and Transformers 4-bit MoE Patches: Eliminated several temporary patches related to PEFT's MoE 4-bit parameter wrapper, BitsAndBytes 4-bit quantization for MoE parameters, and dequantization logic within transformers integrations.
Refined Compiler Decorator Stripping: Expanded the list of decorators that the compiler strips from source code, including use_kernel_forward_from_hub, use_kernelized_func, and auto_docstring, and added a new logger.info statement for MoE imports.
Improved GPT OSS Model Name Handling: Introduced a new helper function _normalized_unsloth_model_name to standardize the retrieval of the model name from environment variables, and updated various GPT OSS-related checks to use this helper.
Enhanced Attention Mask Retrieval: Modified the return_attention_mask wrapper to more robustly identify input_embeds and expanded the accepted data types for attention masks to include torch.int64 and torch.bool.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth_zoo/compiler.py
- Added a logging statement when MoE imports are added.
- Expanded the set of decorators to be stripped during compilation to include use_kernel_forward_from_hub, use_kernelized_func, and auto_docstring.
- Modified the conditional logic for patching forward methods, now only applying if @torch.compiler.disable is present.
- Introduced a new regex substitution to remove @use_kernelized_func from source code.
unsloth_zoo/temporary_patches/gpt_oss.py
- Added patch_gpt_oss_compiler_exports to export necessary helper symbols for compiler-generated GPT-OSS modules.
- Created a new helper function _normalized_unsloth_model_name to consistently process the UNSLOTH_MODEL_NAME environment variable.
- Updated multiple functions (_should_use_gpt_oss_bnb4bit, _is_gpt_oss_4bit_load, patch_gpt_oss_moe_for_lora, patch_gpt_oss_linearized, patch_GptOssAttention, patch_GptOssModel, patch_gpt_oss_init_weights_modulelist_fix, patch_gpt_oss_for_grpo) to utilize _normalized_unsloth_model_name for model name checks.
- Improved the return_attention_mask wrapper to search for input_embeds more comprehensively and to accept torch.int64 and torch.bool as valid attention mask dtypes.
unsloth_zoo/temporary_patches/misc.py
- Removed patch_peft_moe_4bit_paramwrapper_and_injection, which handled PEFT's MoE 4-bit parameter injection and ParamWrapper behavior.
- Removed patch_transformers_bnb4bit_moe_param_quantization, which patched BitsAndBytes quantization for MoE parameters.
- Removed patch_transformers_moe_bnb4bit_dequantization, which patched dequantization of Params4bit expert tensors.
unsloth_zoo/temporary_patches/moe_utils.py
- Removed the auto-dequantization logic for BitsAndBytes 4-bit packed MoE parameters from _get_base_weight.

Activity

The author identified an issue where accidentally added code for 4-bit nn.Parameter conversion caused problems with GPT OSS 4-bit on T4 GPUs.
The author merged pull request Fix missing ParameterModule export in GPT-OSS compiler path #519 to resolve a path issue, which was necessary for testing these fixes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request reverts code related to 4-bit support for Mixture-of-Experts (MoE) models, which was causing issues. The changes primarily involve deleting the problematic patches from unsloth_zoo/temporary_patches/misc.py and related dequantization logic from unsloth_zoo/temporary_patches/moe_utils.py. Additionally, there are minor improvements in unsloth_zoo/compiler.py, including a new log message and a more specific condition for detecting patched forward functions. The changes are consistent with the goal of fixing the described issue, and I have not found any problems with this pull request.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12a575ae4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-26T16:57:35Z

-        if orig_fwd:
-            patched_forward_info = (func_match.group(1), orig_fwd.group(1))
-            disable = None  # Keep patched source as-is for renamed forward replacements
+    if "@torch.compiler.disable" in forward_source:


Detect renamed forward patches without disable decorator

Limiting renamed-forward detection to sources containing @torch.compiler.disable skips valid patched forwards that are renamed but undecorated (for example patch_function(DeepseekV3MoE, "forward", patched_moe_forward) in temporary_patches/deepseek_v3_moe.py). When this branch is skipped, create_standalone_class no longer swaps the class’s original forward with the patched implementation, so compiled modules silently fall back to stale/original forward logic and lose the runtime patch behavior.

Useful? React with 👍 / 👎.

danielhanchen and others added 9 commits February 25, 2026 17:09

Fix GPT-OSS compiler exports for ParameterModule

2524b1b

Normalize GPT-OSS model-name guard for compiler exports

5776d3c

Normalize GPT-OSS model name guards for hyphenated names

d164b79

Strip use_kernel_forward_from_hub decorators during class rewrite

2e1b178

Clean compiler export patch flow and drop redundant pass

4dd1ed6

Strip @auto_docstring decorators during class rewrite

a870ecc

Handle GPT-OSS 5.2 mask kwargs and strip kernelized decorators

bf5e103

Undo 4bit moe lora

aec5747

Merge remote-tracking branch 'origin/fix/gpt-oss-parametermodule-comp…

f948f61

…iler-export' into fix_gptoss_4bit

Datta0 changed the base branch from main to fix/gpt-oss-parametermodule-compiler-export February 26, 2026 16:52

logging

d4518b5

Datta0 force-pushed the fix_gptoss_4bit branch from 12a575a to d4518b5 Compare February 26, 2026 16:53

Datta0 requested review from danielhanchen and mmathew23 February 26, 2026 16:54

gemini-code-assist Bot reviewed Feb 26, 2026

View reviewed changes

Datta0 changed the base branch from fix/gpt-oss-parametermodule-compiler-export to main February 26, 2026 16:55

danielhanchen merged commit 7e1356c into unslothai:main Feb 26, 2026

chatgpt-codex-connector Bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gptoss 4bit#524

Fix gptoss 4bit#524
danielhanchen merged 10 commits into
unslothai:mainfrom
Datta0:fix_gptoss_4bit

Datta0 commented Feb 26, 2026

Uh oh!

gemini-code-assist Bot commented Feb 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Datta0 commented Feb 26, 2026

Uh oh!

gemini-code-assist Bot commented Feb 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants