Skip to content

feat-qgalore#3654

Merged
NanoCode012 merged 2 commits into
axolotl-ai-cloud:mainfrom
ved1beta:new-qgalore
May 22, 2026
Merged

feat-qgalore#3654
NanoCode012 merged 2 commits into
axolotl-ai-cloud:mainfrom
ved1beta:new-qgalore

Conversation

@ved1beta

@ved1beta ved1beta commented May 14, 2026

Copy link
Copy Markdown
Member

Description

feat-qgalore
https://arxiv.org/pdf/2407.08296

Motivation and Context

#1752

How has this been tested?

unit test + manual run

AI Usage Disclaimer

claude opus helped with ideation and testing

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Q-GaLore optimizer (q_galore_adamw8bit) with configurable rank, projection, and quantization parameters for memory-efficient fine-tuning.
    • Requires FSDP2, full fine-tuning mode (incompatible with adapters), and bfloat16 precision.
  • Documentation

    • Added Q-GaLore optimizer installation guidance and complete YAML configuration examples.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2be2eada-229a-47b1-9db4-93f45456c885

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds complete support for the Q-GaLore optimizer to Axolotl by introducing utility functions for bitsandbytes compatibility and parameter grouping, extending the configuration schema with Q-GaLore-specific hyperparameters, validating incompatibilities and constraints, integrating the optimizer into the trainer builder, testing end-to-end training, and documenting the optimizer for users.

Changes

Q-GaLore Optimizer Support

Layer / File(s) Summary
Q-GaLore Integration Utilities
src/axolotl/utils/optimizers/qgalore.py
patch_q_galore_for_modern_bnb() detects newer bitsandbytes via function signature inspection and monkey-patches Q-GaLore's optimizer update functions for compatibility. build_qgalore_param_groups() partitions trainable parameters into two optimizer groups: 2D weights matching target modules for Q-GaLore projection, and all other trainable parameters in a plain group.
Configuration Schema
src/axolotl/utils/schemas/training.py
Adds 10 optional Q-GaLore hyperparameter fields to HyperparametersConfig with defaults and descriptions covering rank, projection update cadence, scaling, projection type, INT quantization controls, and adaptive-subspace update parameters.
Configuration Validation and Tests
src/axolotl/utils/schemas/validation.py, tests/utils/schemas/validation/test_qgalore.py
check_qgalore validator enforces optimizer constraints: rejects configs with adapters, deepspeed, or non-FSDP2 setups; requires use_orig_params=True when FSDP is used; warns on missing mixed precision; defaults optim_target_modules to ["attn", "mlp"]. Unit tests validate adapter rejection, FSDP1 rejection, and default field population.
Trainer Integration and Dependency
pyproject.toml, src/axolotl/core/builders/base.py
Adds q-galore-torch==1.0 to optional dependencies. TrainerBuilderBase._configure_optimizer wires q_galore_adamw8bit by patching Q-GaLore, importing QGaLoreAdamW8bit, and building parameter groups from configuration values. get_callbacks simplifies GCCallback wiring to use gc_steps directly without fallback logic.
End-to-End Test
tests/e2e/test_optimizers.py
test_q_galore_adamw8bit conditionally runs when q_galore_torch is available, configures training with small rank/gap/group_size values and bf16, validates config, loads datasets, trains, and asserts the optimizer class name contains AdamW8bit.
User Documentation
docs/optimizers.qmd
Documents q_galore_adamw8bit under Custom Optimizers with full fine-tuning requirement, incompatibilities (adapters, 4-bit, 8-bit loading, DeepSpeed), FSDP constraints, installation command, explanation, and example YAML configuration with configurable parameters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

ready to merge

Suggested reviewers

  • winglian
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'feat-qgalore' is vague and generic, using a feature branch naming convention rather than a descriptive summary of the actual changes. Use a more descriptive title that clearly summarizes the main change, such as 'Add Q-GaLore optimizer support' or 'Integrate Q-GaLore custom optimizer with configuration validation'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/axolotl/utils/optimizers/qgalore.py (1)

29-31: ⚡ Quick win

Consider unpacking syntax for clearer tuple construction.

Static analysis suggests using unpacking syntax instead of tuple concatenation, which is more idiomatic and readable in Python.

♻️ Proposed refactor
         optimizer_update_8bit_blockwise=(
             lambda *a, **kw: bw(
-                *(a[:7] + (0.0, 0.0) + a[7:] if len(a) == 15 else a), **kw
+                *((*a[:7], 0.0, 0.0, *a[7:]) if len(a) == 15 else a), **kw
             )
         ),
         optimizer_update_32bit=(
             lambda *a, **kw: fp32(
-                *(a[:10] + (0.0, 0.0) + a[10:] if len(a) == 13 else a), **kw
+                *((*a[:10], 0.0, 0.0, *a[10:]) if len(a) == 13 else a), **kw
             )
         ),

As per coding guidelines, Ruff static analysis tool flagged RUF005: prefer unpacking over concatenation.

Also applies to: 34-36

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/utils/optimizers/qgalore.py` around lines 29 - 31, Replace the
tuple-concatenation used in the wrapper lambda with Python unpacking for
readability and to satisfy RUF005: instead of a[:7] + (0.0, 0.0) + a[7:],
construct the args as (*a[:7], 0.0, 0.0, *a[7:]) inside the lambda that wraps bw
(the anonymous lambda that calls bw(*(…), **kw)); apply the same unpacking
change to the other similar wrapper occurrences around the bw calls referenced
in the diff.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/axolotl/utils/schemas/validation.py`:
- Around line 916-954: In check_qgalore, add explicit validation to reject
quantized model-loading flags when optimizer == "q_galore_adamw8bit": check
data.get("load_in_8bit") and data.get("load_in_4bit") (and any equivalent keys
used elsewhere, e.g., "bnb_4bit") and raise a ValueError with a clear message
that q_galore_adamw8bit is incompatible with those settings; update the function
(check_qgalore) so these checks occur alongside the existing
adapter/deepspeed/fsdp checks before returning data.

---

Nitpick comments:
In `@src/axolotl/utils/optimizers/qgalore.py`:
- Around line 29-31: Replace the tuple-concatenation used in the wrapper lambda
with Python unpacking for readability and to satisfy RUF005: instead of a[:7] +
(0.0, 0.0) + a[7:], construct the args as (*a[:7], 0.0, 0.0, *a[7:]) inside the
lambda that wraps bw (the anonymous lambda that calls bw(*(…), **kw)); apply the
same unpacking change to the other similar wrapper occurrences around the bw
calls referenced in the diff.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f9165850-7c40-42c0-9b9a-5f607db7dda9

📥 Commits

Reviewing files that changed from the base of the PR and between d7cb1c9 and 5e704b6.

📒 Files selected for processing (8)
  • docs/optimizers.qmd
  • pyproject.toml
  • src/axolotl/core/builders/base.py
  • src/axolotl/utils/optimizers/qgalore.py
  • src/axolotl/utils/schemas/training.py
  • src/axolotl/utils/schemas/validation.py
  • tests/e2e/test_optimizers.py
  • tests/utils/schemas/validation/test_qgalore.py

Comment on lines +916 to +954
def check_qgalore(cls, data):
if data.get("optimizer") != "q_galore_adamw8bit":
return data
adapter = data.get("adapter")
if adapter:
raise ValueError(
"q_galore_adamw8bit operates on full-precision parameters and is "
f"incompatible with adapter='{adapter}'. Remove the adapter setting "
"or pick a different optimizer."
)
if data.get("deepspeed"):
raise ValueError(
"q_galore_adamw8bit is not yet validated with DeepSpeed. "
"Use DDP or FSDP2 with use_orig_params=True."
)
if data.get("fsdp") or data.get("fsdp_config"):
fsdp_version = cls._resolve_fsdp_version(data)
if str(fsdp_version) != "2":
raise ValueError(
"q_galore_adamw8bit requires FSDP2. Set fsdp_version: 2."
)
fsdp_config = data.get("fsdp_config") or {}
if fsdp_config.get("use_orig_params") is not True:
raise ValueError(
"q_galore_adamw8bit requires fsdp_config.use_orig_params=True so "
"that per-parameter projection state survives FSDP sharding."
)
if not (data.get("bf16") or data.get("bfloat16") or data.get("fp16")):
LOG.warning(
"q_galore_adamw8bit benefits from mixed-precision (bf16/fp16). "
"Running in fp32 will negate most of the memory savings."
)
if data.get("optim_target_modules") is None:
# Match the reference impl's defaults: attention + MLP linears.
data["optim_target_modules"] = [
"attn",
"mlp",
]
return data

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing validation for incompatible quantization settings.

The documentation (optimizers.qmd:144-146) states that q_galore_adamw8bit is incompatible with load_in_8bit and load_in_4bit, but the validator only checks for adapter and doesn't reject these quantization options. This could lead to runtime errors or undefined behavior.

🛡️ Proposed fix to add validation
     `@classmethod`
     def check_qgalore(cls, data):
         if data.get("optimizer") != "q_galore_adamw8bit":
             return data
         adapter = data.get("adapter")
         if adapter:
             raise ValueError(
                 "q_galore_adamw8bit operates on full-precision parameters and is "
                 f"incompatible with adapter='{adapter}'. Remove the adapter setting "
                 "or pick a different optimizer."
             )
+        if data.get("load_in_8bit"):
+            raise ValueError(
+                "q_galore_adamw8bit is incompatible with load_in_8bit. "
+                "Use full-precision model loading."
+            )
+        if data.get("load_in_4bit"):
+            raise ValueError(
+                "q_galore_adamw8bit is incompatible with load_in_4bit. "
+                "Use full-precision model loading."
+            )
         if data.get("deepspeed"):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def check_qgalore(cls, data):
if data.get("optimizer") != "q_galore_adamw8bit":
return data
adapter = data.get("adapter")
if adapter:
raise ValueError(
"q_galore_adamw8bit operates on full-precision parameters and is "
f"incompatible with adapter='{adapter}'. Remove the adapter setting "
"or pick a different optimizer."
)
if data.get("deepspeed"):
raise ValueError(
"q_galore_adamw8bit is not yet validated with DeepSpeed. "
"Use DDP or FSDP2 with use_orig_params=True."
)
if data.get("fsdp") or data.get("fsdp_config"):
fsdp_version = cls._resolve_fsdp_version(data)
if str(fsdp_version) != "2":
raise ValueError(
"q_galore_adamw8bit requires FSDP2. Set fsdp_version: 2."
)
fsdp_config = data.get("fsdp_config") or {}
if fsdp_config.get("use_orig_params") is not True:
raise ValueError(
"q_galore_adamw8bit requires fsdp_config.use_orig_params=True so "
"that per-parameter projection state survives FSDP sharding."
)
if not (data.get("bf16") or data.get("bfloat16") or data.get("fp16")):
LOG.warning(
"q_galore_adamw8bit benefits from mixed-precision (bf16/fp16). "
"Running in fp32 will negate most of the memory savings."
)
if data.get("optim_target_modules") is None:
# Match the reference impl's defaults: attention + MLP linears.
data["optim_target_modules"] = [
"attn",
"mlp",
]
return data
def check_qgalore(cls, data):
if data.get("optimizer") != "q_galore_adamw8bit":
return data
adapter = data.get("adapter")
if adapter:
raise ValueError(
"q_galore_adamw8bit operates on full-precision parameters and is "
f"incompatible with adapter='{adapter}'. Remove the adapter setting "
"or pick a different optimizer."
)
if data.get("load_in_8bit"):
raise ValueError(
"q_galore_adamw8bit is incompatible with load_in_8bit. "
"Use full-precision model loading."
)
if data.get("load_in_4bit"):
raise ValueError(
"q_galore_adamw8bit is incompatible with load_in_4bit. "
"Use full-precision model loading."
)
if data.get("deepspeed"):
raise ValueError(
"q_galore_adamw8bit is not yet validated with DeepSpeed. "
"Use DDP or FSDP2 with use_orig_params=True."
)
if data.get("fsdp") or data.get("fsdp_config"):
fsdp_version = cls._resolve_fsdp_version(data)
if str(fsdp_version) != "2":
raise ValueError(
"q_galore_adamw8bit requires FSDP2. Set fsdp_version: 2."
)
fsdp_config = data.get("fsdp_config") or {}
if fsdp_config.get("use_orig_params") is not True:
raise ValueError(
"q_galore_adamw8bit requires fsdp_config.use_orig_params=True so "
"that per-parameter projection state survives FSDP sharding."
)
if not (data.get("bf16") or data.get("bfloat16") or data.get("fp16")):
LOG.warning(
"q_galore_adamw8bit benefits from mixed-precision (bf16/fp16). "
"Running in fp32 will negate most of the memory savings."
)
if data.get("optim_target_modules") is None:
# Match the reference impl's defaults: attention + MLP linears.
data["optim_target_modules"] = [
"attn",
"mlp",
]
return data
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/utils/schemas/validation.py` around lines 916 - 954, In
check_qgalore, add explicit validation to reject quantized model-loading flags
when optimizer == "q_galore_adamw8bit": check data.get("load_in_8bit") and
data.get("load_in_4bit") (and any equivalent keys used elsewhere, e.g.,
"bnb_4bit") and raise a ValueError with a clear message that q_galore_adamw8bit
is incompatible with those settings; update the function (check_qgalore) so
these checks occur alongside the existing adapter/deepspeed/fsdp checks before
returning data.

@codecov

codecov Bot commented May 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 43.28358% with 38 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/axolotl/utils/optimizers/qgalore.py 0.00% 25 Missing ⚠️
src/axolotl/core/builders/base.py 11.11% 8 Missing ⚠️
src/axolotl/utils/schemas/validation.py 77.27% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@NanoCode012 NanoCode012 merged commit 20f56fa into axolotl-ai-cloud:main May 22, 2026
15 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants