feat-qgalore by ved1beta · Pull Request #3654 · axolotl-ai-cloud/axolotl

ved1beta · 2026-05-14T17:23:37Z

Description

feat-qgalore
https://arxiv.org/pdf/2407.08296

Motivation and Context

#1752

How has this been tested?

unit test + manual run

AI Usage Disclaimer

claude opus helped with ideation and testing

Summary by CodeRabbit

Release Notes

New Features
- Added Q-GaLore optimizer (q_galore_adamw8bit) with configurable rank, projection, and quantization parameters for memory-efficient fine-tuning.
- Requires FSDP2, full fine-tuning mode (incompatible with adapters), and bfloat16 precision.
Documentation
- Added Q-GaLore optimizer installation guidance and complete YAML configuration examples.

coderabbitai · 2026-05-14T17:23:50Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2be2eada-229a-47b1-9db4-93f45456c885

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds complete support for the Q-GaLore optimizer to Axolotl by introducing utility functions for bitsandbytes compatibility and parameter grouping, extending the configuration schema with Q-GaLore-specific hyperparameters, validating incompatibilities and constraints, integrating the optimizer into the trainer builder, testing end-to-end training, and documenting the optimizer for users.

Changes

Q-GaLore Optimizer Support

Layer / File(s)	Summary
Q-GaLore Integration Utilities `src/axolotl/utils/optimizers/qgalore.py`	`patch_q_galore_for_modern_bnb()` detects newer bitsandbytes via function signature inspection and monkey-patches Q-GaLore's optimizer update functions for compatibility. `build_qgalore_param_groups()` partitions trainable parameters into two optimizer groups: 2D weights matching target modules for Q-GaLore projection, and all other trainable parameters in a plain group.
Configuration Schema `src/axolotl/utils/schemas/training.py`	Adds 10 optional Q-GaLore hyperparameter fields to `HyperparametersConfig` with defaults and descriptions covering rank, projection update cadence, scaling, projection type, INT quantization controls, and adaptive-subspace update parameters.
Configuration Validation and Tests `src/axolotl/utils/schemas/validation.py`, `tests/utils/schemas/validation/test_qgalore.py`	`check_qgalore` validator enforces optimizer constraints: rejects configs with adapters, deepspeed, or non-FSDP2 setups; requires `use_orig_params=True` when FSDP is used; warns on missing mixed precision; defaults `optim_target_modules` to `["attn", "mlp"]`. Unit tests validate adapter rejection, FSDP1 rejection, and default field population.
Trainer Integration and Dependency `pyproject.toml`, `src/axolotl/core/builders/base.py`	Adds `q-galore-torch==1.0` to optional dependencies. `TrainerBuilderBase._configure_optimizer` wires `q_galore_adamw8bit` by patching Q-GaLore, importing `QGaLoreAdamW8bit`, and building parameter groups from configuration values. `get_callbacks` simplifies GCCallback wiring to use `gc_steps` directly without fallback logic.
End-to-End Test `tests/e2e/test_optimizers.py`	`test_q_galore_adamw8bit` conditionally runs when `q_galore_torch` is available, configures training with small rank/gap/group_size values and bf16, validates config, loads datasets, trains, and asserts the optimizer class name contains `AdamW8bit`.
User Documentation `docs/optimizers.qmd`	Documents `q_galore_adamw8bit` under Custom Optimizers with full fine-tuning requirement, incompatibilities (adapters, 4-bit, 8-bit loading, DeepSpeed), FSDP constraints, installation command, explanation, and example YAML configuration with configurable parameters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

ready to merge

Suggested reviewers

winglian

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'feat-qgalore' is vague and generic, using a feature branch naming convention rather than a descriptive summary of the actual changes.	Use a more descriptive title that clearly summarizes the main change, such as 'Add Q-GaLore optimizer support' or 'Integrate Q-GaLore custom optimizer with configuration validation'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/axolotl/utils/optimizers/qgalore.py (1)

29-31: ⚡ Quick win

Consider unpacking syntax for clearer tuple construction.

Static analysis suggests using unpacking syntax instead of tuple concatenation, which is more idiomatic and readable in Python.

♻️ Proposed refactor

         optimizer_update_8bit_blockwise=(
             lambda *a, **kw: bw(
-                *(a[:7] + (0.0, 0.0) + a[7:] if len(a) == 15 else a), **kw
+                *((*a[:7], 0.0, 0.0, *a[7:]) if len(a) == 15 else a), **kw
             )
         ),
         optimizer_update_32bit=(
             lambda *a, **kw: fp32(
-                *(a[:10] + (0.0, 0.0) + a[10:] if len(a) == 13 else a), **kw
+                *((*a[:10], 0.0, 0.0, *a[10:]) if len(a) == 13 else a), **kw
             )
         ),

As per coding guidelines, Ruff static analysis tool flagged RUF005: prefer unpacking over concatenation.

Also applies to: 34-36

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/utils/optimizers/qgalore.py` around lines 29 - 31, Replace the
tuple-concatenation used in the wrapper lambda with Python unpacking for
readability and to satisfy RUF005: instead of a[:7] + (0.0, 0.0) + a[7:],
construct the args as (*a[:7], 0.0, 0.0, *a[7:]) inside the lambda that wraps bw
(the anonymous lambda that calls bw(*(…), **kw)); apply the same unpacking
change to the other similar wrapper occurrences around the bw calls referenced
in the diff.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/axolotl/utils/schemas/validation.py`:
- Around line 916-954: In check_qgalore, add explicit validation to reject
quantized model-loading flags when optimizer == "q_galore_adamw8bit": check
data.get("load_in_8bit") and data.get("load_in_4bit") (and any equivalent keys
used elsewhere, e.g., "bnb_4bit") and raise a ValueError with a clear message
that q_galore_adamw8bit is incompatible with those settings; update the function
(check_qgalore) so these checks occur alongside the existing
adapter/deepspeed/fsdp checks before returning data.

---

Nitpick comments:
In `@src/axolotl/utils/optimizers/qgalore.py`:
- Around line 29-31: Replace the tuple-concatenation used in the wrapper lambda
with Python unpacking for readability and to satisfy RUF005: instead of a[:7] +
(0.0, 0.0) + a[7:], construct the args as (*a[:7], 0.0, 0.0, *a[7:]) inside the
lambda that wraps bw (the anonymous lambda that calls bw(*(…), **kw)); apply the
same unpacking change to the other similar wrapper occurrences around the bw
calls referenced in the diff.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f9165850-7c40-42c0-9b9a-5f607db7dda9

📥 Commits

Reviewing files that changed from the base of the PR and between d7cb1c9 and 5e704b6.

📒 Files selected for processing (8)

docs/optimizers.qmd
pyproject.toml
src/axolotl/core/builders/base.py
src/axolotl/utils/optimizers/qgalore.py
src/axolotl/utils/schemas/training.py
src/axolotl/utils/schemas/validation.py
tests/e2e/test_optimizers.py
tests/utils/schemas/validation/test_qgalore.py

coderabbitai · 2026-05-14T17:27:19Z

+    def check_qgalore(cls, data):
+        if data.get("optimizer") != "q_galore_adamw8bit":
+            return data
+        adapter = data.get("adapter")
+        if adapter:
+            raise ValueError(
+                "q_galore_adamw8bit operates on full-precision parameters and is "
+                f"incompatible with adapter='{adapter}'. Remove the adapter setting "
+                "or pick a different optimizer."
+            )
+        if data.get("deepspeed"):
+            raise ValueError(
+                "q_galore_adamw8bit is not yet validated with DeepSpeed. "
+                "Use DDP or FSDP2 with use_orig_params=True."
+            )
+        if data.get("fsdp") or data.get("fsdp_config"):
+            fsdp_version = cls._resolve_fsdp_version(data)
+            if str(fsdp_version) != "2":
+                raise ValueError(
+                    "q_galore_adamw8bit requires FSDP2. Set fsdp_version: 2."
+                )
+            fsdp_config = data.get("fsdp_config") or {}
+            if fsdp_config.get("use_orig_params") is not True:
+                raise ValueError(
+                    "q_galore_adamw8bit requires fsdp_config.use_orig_params=True so "
+                    "that per-parameter projection state survives FSDP sharding."
+                )
+        if not (data.get("bf16") or data.get("bfloat16") or data.get("fp16")):
+            LOG.warning(
+                "q_galore_adamw8bit benefits from mixed-precision (bf16/fp16). "
+                "Running in fp32 will negate most of the memory savings."
+            )
+        if data.get("optim_target_modules") is None:
+            # Match the reference impl's defaults: attention + MLP linears.
+            data["optim_target_modules"] = [
+                "attn",
+                "mlp",
+            ]
+        return data


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing validation for incompatible quantization settings.

The documentation (optimizers.qmd:144-146) states that q_galore_adamw8bit is incompatible with load_in_8bit and load_in_4bit, but the validator only checks for adapter and doesn't reject these quantization options. This could lead to runtime errors or undefined behavior.

🛡️ Proposed fix to add validation

`@classmethod` def check_qgalore(cls, data): if data.get("optimizer") != "q_galore_adamw8bit": return data adapter = data.get("adapter") if adapter: raise ValueError( "q_galore_adamw8bit operates on full-precision parameters and is " f"incompatible with adapter='{adapter}'. Remove the adapter setting " "or pick a different optimizer." ) + if data.get("load_in_8bit"): + raise ValueError( + "q_galore_adamw8bit is incompatible with load_in_8bit. " + "Use full-precision model loading." + ) + if data.get("load_in_4bit"): + raise ValueError( + "q_galore_adamw8bit is incompatible with load_in_4bit. " + "Use full-precision model loading." + ) if data.get("deepspeed"):

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def check_qgalore(cls, data):

if data.get("optimizer") != "q_galore_adamw8bit":

return data

adapter = data.get("adapter")

if adapter:

raise ValueError(

"q_galore_adamw8bit operates on full-precision parameters and is "

f"incompatible with adapter='{adapter}'. Remove the adapter setting "

"or pick a different optimizer."

)

if data.get("deepspeed"):

raise ValueError(

"q_galore_adamw8bit is not yet validated with DeepSpeed. "

"Use DDP or FSDP2 with use_orig_params=True."

)

if data.get("fsdp") or data.get("fsdp_config"):

fsdp_version = cls._resolve_fsdp_version(data)

if str(fsdp_version) != "2":

raise ValueError(

"q_galore_adamw8bit requires FSDP2. Set fsdp_version: 2."

)

fsdp_config = data.get("fsdp_config") or {}

if fsdp_config.get("use_orig_params") is not True:

raise ValueError(

"q_galore_adamw8bit requires fsdp_config.use_orig_params=True so "

"that per-parameter projection state survives FSDP sharding."

)

if not (data.get("bf16") or data.get("bfloat16") or data.get("fp16")):

LOG.warning(

"q_galore_adamw8bit benefits from mixed-precision (bf16/fp16). "

"Running in fp32 will negate most of the memory savings."

)

if data.get("optim_target_modules") is None:

# Match the reference impl's defaults: attention + MLP linears.

data["optim_target_modules"] = [

"attn",

"mlp",

]

return data

def check_qgalore(cls, data):

if data.get("optimizer") != "q_galore_adamw8bit":

return data

adapter = data.get("adapter")

if adapter:

raise ValueError(

"q_galore_adamw8bit operates on full-precision parameters and is "

f"incompatible with adapter='{adapter}'. Remove the adapter setting "

"or pick a different optimizer."

)

if data.get("load_in_8bit"):

raise ValueError(

"q_galore_adamw8bit is incompatible with load_in_8bit. "

"Use full-precision model loading."

)

if data.get("load_in_4bit"):

raise ValueError(

"q_galore_adamw8bit is incompatible with load_in_4bit. "

"Use full-precision model loading."

)

if data.get("deepspeed"):

raise ValueError(

"q_galore_adamw8bit is not yet validated with DeepSpeed. "

"Use DDP or FSDP2 with use_orig_params=True."

)

if data.get("fsdp") or data.get("fsdp_config"):

fsdp_version = cls._resolve_fsdp_version(data)

if str(fsdp_version) != "2":

raise ValueError(

"q_galore_adamw8bit requires FSDP2. Set fsdp_version: 2."

)

fsdp_config = data.get("fsdp_config") or {}

if fsdp_config.get("use_orig_params") is not True:

raise ValueError(

"q_galore_adamw8bit requires fsdp_config.use_orig_params=True so "

"that per-parameter projection state survives FSDP sharding."

)

if not (data.get("bf16") or data.get("bfloat16") or data.get("fp16")):

LOG.warning(

"q_galore_adamw8bit benefits from mixed-precision (bf16/fp16). "

"Running in fp32 will negate most of the memory savings."

)

if data.get("optim_target_modules") is None:

# Match the reference impl's defaults: attention + MLP linears.

data["optim_target_modules"] = [

"attn",

"mlp",

]

return data

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/axolotl/utils/schemas/validation.py` around lines 916 - 954, In check_qgalore, add explicit validation to reject quantized model-loading flags when optimizer == "q_galore_adamw8bit": check data.get("load_in_8bit") and data.get("load_in_4bit") (and any equivalent keys used elsewhere, e.g., "bnb_4bit") and raise a ValueError with a clear message that q_galore_adamw8bit is incompatible with those settings; update the function (check_qgalore) so these checks occur alongside the existing adapter/deepspeed/fsdp checks before returning data.

codecov · 2026-05-15T04:06:35Z

Codecov Report

❌ Patch coverage is 43.28358% with 38 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/utils/optimizers/qgalore.py	0.00%	25 Missing ⚠️
src/axolotl/core/builders/base.py	11.11%	8 Missing ⚠️
src/axolotl/utils/schemas/validation.py	77.27%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

feat-qgalore

5e704b6

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

spell check

76dfda4

NanoCode012 approved these changes May 18, 2026

View reviewed changes

NanoCode012 added the ready to merge label May 19, 2026

NanoCode012 merged commit 20f56fa into axolotl-ai-cloud:main May 22, 2026
15 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat-qgalore#3654

feat-qgalore#3654
NanoCode012 merged 2 commits into
axolotl-ai-cloud:mainfrom
ved1beta:new-qgalore

ved1beta commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

codecov Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ved1beta commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 15, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ved1beta commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading