Lora kernels bias support by djsaunde · Pull Request #3025 · axolotl-ai-cloud/axolotl

djsaunde · 2025-08-06T18:29:43Z

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

New Features
- Improved support for bias tensors in LoRA computations, ensuring biases are consistently integrated into all relevant modules.
Bug Fixes
- Corrected handling of weight and bias gradients in LoRA modules, improving reliability during training and inference.
Chores
- Updated logic for applying optimized LoRA kernel patches, removing unnecessary restrictions related to bias presence.
Tests
- Enhanced LoRA kernel tests to include bias tensors, validating bias integration in forward and backward passes.

coderabbitai · 2025-08-06T18:29:49Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

The changes extend LoRA kernel support to handle bias tensors alongside weights during parameter extraction and matrix multiplications. All relevant LoRA autograd functions and application methods are updated to accept, propagate, and utilize bias tensors. Additionally, the LoRA kernel patching logic is relaxed to allow patching even when biases are present, updating related eligibility checks and warning messages.

Changes

Cohort / File(s)	Change Summary
LoRA Kernel Bias Support `src/axolotl/kernels/lora.py`	Adds support for bias tensors in LoRA parameter extraction, matrix multiplication, and all related autograd functions and application methods. Updates function and method signatures, forward and backward logic, and parameter unpacking to include biases.
LoRA Kernel Patching Logic `src/axolotl/monkeypatch/lora_kernels.py`	Removes requirement that base layer bias must be `None` for LoRA kernel patching. Updates eligibility checks and warning messages to reflect new logic, allowing patching when biases are present.
LoRA Kernel Tests `tests/e2e/kernels/test_lora.py`	Updates test cases to include bias tensors in fixtures and function calls. Adjusts expected outputs and gradient checks to account for bias addition in LoRA kernel computations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Lora kernels bias support #3025: Modifies the same LoRA kernel functions and classes to add bias tensor support, closely related at the code level.

Suggested reviewers

winglian

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d7d365d and 7c5eaa2.

📒 Files selected for processing (1)

src/axolotl/monkeypatch/lora_kernels.py (3 hunks)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch lora-kernels-bias

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2025-08-06T18:40:43Z

Codecov Report

❌ Patch coverage is 0% with 40 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/kernels/lora.py	0.00%	40 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/axolotl/kernels/lora.py (1)
546-546: Remove unused ctx.biases assignment.

The ctx.biases tuple is saved in the forward pass but never used in the backward pass. This unnecessary storage can be removed to save memory.
-        ctx.biases = (q_bias, k_bias, v_bias)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e442ff2 and ca2164a.

📒 Files selected for processing (2)

src/axolotl/kernels/lora.py (26 hunks)
src/axolotl/monkeypatch/lora_kernels.py (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/axolotl/kernels/lora.py (1)

src/axolotl/kernels/quantize.py (1)

dequantize (20-150)

🔇 Additional comments (12)

src/axolotl/monkeypatch/lora_kernels.py (3)

402-403: LGTM! Warning message correctly updated for bias support.

The updated warning message accurately reflects the new requirements - LoRA adapters and no DoRA support, without the previous bias restriction.

420-421: LGTM! Consistent warning message update.

The output projection warning is consistently updated to match the QKV warning format.

437-438: LGTM! MLP warning message properly updated.

The MLP layer warning message is consistently updated to remove the bias restriction.

src/axolotl/kernels/lora.py (9)

25-80: LGTM! Bias extraction properly implemented in get_lora_parameters.

The function now correctly extracts and returns the bias tensor from the base layer, with proper handling for both cases (with and without adapters).

82-127: LGTM! Bias handling correctly integrated in matmul_lora.

The function properly applies bias after the matrix multiplication and LoRA operations, which is the correct order. The parameter reordering to group weight and bias together improves readability.

216-244: Verify: Bias gradients are not computed in backward pass.

The backward method returns None for all bias gradients (lines 223, 224, 230, 236, 243). This means biases won't be updated during training. Please confirm this is intentional - typically in LoRA, base model parameters (including biases) are frozen while only LoRA parameters are trained.

391-431: LGTM! Bias parameters correctly propagated in SwiGLU MLP.

The function properly extracts bias tensors and passes them to the LoRA_MLP autograd function.

434-473: LGTM! Consistent bias handling in GEGLU MLP.

The GEGLU variant maintains consistency with the SwiGLU implementation for bias handling.

551-691: Consistent with MLP: bias gradients not computed.

Similar to LoRA_MLP, the backward pass returns None for all bias gradients. This aligns with the typical LoRA training approach where base model parameters remain frozen.

694-733: LGTM! QKV bias parameters properly handled.

The function correctly extracts and passes bias tensors to the LoRA_QKV autograd function.

736-824: LGTM! Output projection bias support implemented consistently.

The LoRA_O class follows the same pattern as LoRA_QKV and LoRA_MLP, with proper bias propagation in forward and None gradients in backward.

826-839: LGTM! Output projection application with bias support.

The function correctly handles bias extraction and propagation.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/axolotl/kernels/lora.py (1)

82-105: Update documentation to include bias parameter

The function signature includes a b parameter for bias, but it's not documented in the Args section.

     Args:
         X: Input tensor [*, in_features]
         W: Base weight matrix [out_features, in_features]
+        b: Optional bias tensor [out_features]
         W_quant: Quantization state for W
         A: LoRA A matrix [rank, in_features]
         B: LoRA B matrix [out_features, rank]
         s: LoRA scaling factor
         out: Optional output tensor for inplace operations

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e442ff2 and ca2164a.

📒 Files selected for processing (2)

src/axolotl/kernels/lora.py (26 hunks)
src/axolotl/monkeypatch/lora_kernels.py (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/axolotl/kernels/lora.py (1)

src/axolotl/kernels/quantize.py (1)

dequantize (20-150)

🔇 Additional comments (11)

src/axolotl/monkeypatch/lora_kernels.py (2)

420-421: LGTM!

The warning message correctly reflects the updated requirements for patching output projections.

437-438: LGTM!

The warning message correctly reflects the updated requirements for patching MLP layers.
src/axolotl/kernels/lora.py (9)

25-79: LGTM! Bias extraction properly implemented

The function correctly extracts and returns the bias tensor in all code paths, with appropriate documentation updates.

135-213: LGTM! Forward pass correctly handles biases

The forward method properly accepts and propagates bias tensors through all projection computations.

391-473: LGTM! MLP application functions correctly handle biases

Both apply_lora_mlp_swiglu and apply_lora_mlp_geglu consistently extract and propagate bias tensors for all projections.

486-549: LGTM! QKV forward pass properly handles biases

The forward method correctly accepts, propagates, and saves bias tensors for all three projections.

551-691: LGTM! QKV backward pass correctly handles gradients

The backward method properly returns None for bias gradients and includes necessary type annotations.

694-733: LGTM! QKV application function correctly handles biases

The function consistently extracts and propagates bias tensors for all three projections.

739-775: LGTM! Output projection forward pass handles bias correctly

The forward method properly accepts and propagates the bias tensor.

777-839: LGTM! Output projection backward and application correctly handle biases

Both the backward method and apply_lora_o function properly handle bias tensors consistently with other projections.

349-351: dequantize correctly restores original weight orientation, so passing .t() is unnecessary

The dequantize function always returns a tensor of shape quant_state.shape (the original weight shape) and only re-transposes internally for 1×N inputs. Whether you call
dequantize(W.t(), quant)  
or
dequantize(W, quant)
the output will match quant_state.shape, making explicit transposes redundant. The change at line 349 to drop .t() and use
gate_weight = dequantize(gate_weight, gate_quant)
dX += grad_gate @ gate_weight
is therefore correct and preserves the intended dimensions. No further action required.

Likely an incorrect or invalid review comment.

djsaunde requested a review from winglian August 6, 2025 18:29

djsaunde commented Aug 6, 2025

View reviewed changes

Comment thread src/axolotl/loaders/model.py Outdated

djsaunde added 3 commits August 6, 2025 18:31

lora kernels bias support

fcdae99

revert rename

0a80a2b

nit

ca2164a

djsaunde force-pushed the lora-kernels-bias branch from 78e6c29 to ca2164a Compare August 6, 2025 18:31

coderabbitai Bot reviewed Aug 6, 2025

View reviewed changes

Comment thread src/axolotl/monkeypatch/lora_kernels.py Outdated

lint, tests

d7d365d

winglian approved these changes Aug 6, 2025

View reviewed changes

satisfying the rabbit

7c5eaa2

djsaunde merged commit d09290f into main Aug 7, 2025
8 of 9 checks passed

djsaunde deleted the lora-kernels-bias branch August 7, 2025 00:20

coderabbitai Bot mentioned this pull request Sep 25, 2025

update lora kernels docs #3186

Closed

coderabbitai Bot mentioned this pull request Mar 22, 2026

feat: LoRA kernel support for bias, dropout, dora, embeddings #3528

Merged

coderabbitai Bot mentioned this pull request May 15, 2026

fix AssertionError: Original QKV code not found #3657

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lora kernels bias support#3025

Lora kernels bias support#3025
djsaunde merged 5 commits into
mainfrom
lora-kernels-bias

djsaunde commented Aug 6, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Aug 6, 2025 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Uh oh!

codecov Bot commented Aug 6, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

djsaunde commented Aug 6, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Uh oh!

codecov Bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djsaunde commented Aug 6, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Aug 6, 2025 •

edited

Loading

codecov Bot commented Aug 6, 2025 •

edited

Loading