Lora kernels fix#2732
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThe changes update type annotations and imports in the model loader to support Changes
Sequence Diagram(s)sequenceDiagram
participant Config
participant ModelLoader
participant Tokenizer
participant Model
participant PatchManager
Config->>ModelLoader: Provide configuration
ModelLoader->>Tokenizer: load_tokenizer(config)
ModelLoader->>Model: load_model(config)
ModelLoader->>PatchManager: Apply patches to Model
PatchManager->>Model: Patch self-attention/LoRA kernels
ModelLoader-->>Config: Return loaded Model and config
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (2)
src/axolotl/loaders/model.py (1)
148-154: 🛠️ Refactor suggestionReturn annotation no longer covers every path
load()can still legitimately return aPeftModelorPeftMixedModelwhen
self.qlora_fsdpisTrue(model is not merged back)cfg.merge_loraisFalse- a mixed PEFT model is produced for IPO/DPO reference models, etc.
Those concrete types were removed from the signature, while
PeftModelForCausalLMwas added (which is already a subclass ofPeftModel).
To keep static checkers (e.g. mypy, pyright) happy, include all possibilities:- def load(self) -> tuple[PreTrainedModel | PeftModelForCausalLM, PeftConfig | None]: + def load( + self, + ) -> tuple[ + PreTrainedModel | PeftModel | PeftMixedModel | PeftModelForCausalLM, + PeftConfig | None, + ]:(or simply
PreTrainedModel|PeftModel|PeftMixedModel).
Clarifies intent and avoids false-positive type errors downstream.tests/e2e/patched/lora_kernels/test_lora_kernel_patching.py (1)
473-504:⚠️ Potential issueTest leaves global patch in place – may create cross-test bleed-through
ModelLoader.load()patches the classattention_cls.forward, and the test asserts on that – but never restores the original implementation afterwards.
If another test in the same session relies on the un-patched forward (or performs its own patch/unpatch cycle) you can get flaky failures.Add a
try/finally(oraddfinalizer) to put the original method back:- model, _ = ModelLoader(cfg, tokenizer).load() + model, _ = ModelLoader(cfg, tokenizer).load() ... - assert found_patched_attn + assert found_patched_attn + + # cleanup – restore original forward to avoid polluting other tests + attention_cls.forward = original_forward_method + if hasattr(attention_cls, "_original_forward"): + delattr(attention_cls, "_original_forward")Small change, big reduction in hidden coupling between tests.
🧹 Nitpick comments (2)
src/axolotl/loaders/model.py (1)
18-24: Import list getting long – consider localising PEFT-only importsThe PEFT-specific classes (
PeftConfig,PeftModel*,prepare_model_for_kbit_training, …) are now imported at module level.
Because they are only used in a handful of helper methods, moving them into the functions that actually need them (lazy import) keeps the import surface smaller and avoids importing PEFT when the caller never asks for adapter support (pure inference on a base model).
Minor, but it can trim CLI start-up latency and helps with optional-dependency hygiene.src/axolotl/loaders/patch_manager.py (1)
173-179: Guard-clause good, but feature toggle names diverge from testsThe new helper checks
cfg.lora_qkv_kernelandcfg.lora_o_kernelonly.
Tests expectcfg.lora_mlp_kernelto be auto-enabled as well; that switch is handled elsewhere but may confuse future readers.Consider updating the doc-string to reflect that only self-attention kernels (QKV & O) are patched here, whereas MLP kernels are handled later. A tiny clarification now avoids hunting through call-stacks later.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/axolotl/loaders/model.py(2 hunks)src/axolotl/loaders/patch_manager.py(3 hunks)tests/e2e/patched/lora_kernels/test_lora_kernel_patching.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/axolotl/loaders/model.py (1)
src/axolotl/utils/schemas/peft.py (1)
PeftConfig(17-20)
src/axolotl/loaders/patch_manager.py (2)
src/axolotl/integrations/base.py (2)
cfg(317-318)cfg(321-322)tests/test_exact_deduplication.py (1)
cfg(221-235)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: PyTest (3.11, 2.5.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: pre-commit
- GitHub Check: pre-commit
🔇 Additional comments (2)
src/axolotl/loaders/patch_manager.py (2)
60-66: Patch order refactor looks good, but double-check idempotency
_patch_loss_llama()and_apply_self_attention_lora_patch()are now invoked unconditionally during the pre-load phase.
Because both monkey-patch global classes/functions, a second invocation (e.g. loading two models in the same process or running pytest with “-n auto”) could overwrite_original_forwarda second time or wrap an already-patched function.Please ensure each patch helper exits early when it detects the target has already been patched (many of them do, some still don’t).
This keeps multi-model or repeated test runs safe.
236-238: Early return prevents unnecessary work – niceThe new guard short-circuits for non-LLaMA models, avoiding wasted imports.
Looks correct and desirable.
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
NanoCode012
left a comment
There was a problem hiding this comment.
Was this issue caused from some accidental rebase+merge?
Description
I realized that a recent change broke the actual application of the self attention code monkeypatches related to LoRA kernels. This PR fixes that.
Also simplified the application of some patches a bit.
Motivation and Context
How has this been tested?
Add pytest, manually tested with and without changes.
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
Summary by CodeRabbit