Skip to content

bump hf deps#2735

Merged
winglian merged 32 commits into
mainfrom
hf-upgrades
Jun 5, 2025
Merged

bump hf deps#2735
winglian merged 32 commits into
mainfrom
hf-upgrades

Conversation

@winglian
Copy link
Copy Markdown
Collaborator

@winglian winglian commented May 28, 2025

Summary by CodeRabbit

  • New Features
    • Added a test to verify LoRA packing with packed datasets, ensuring correct data loader and sampler behavior.
    • Added a multi-GPU test for GRPO training with LoRA and sequence parallelism using vLLM integration.
    • Extended VLLM server to support reasoning features via new CLI arguments and worker logic.
  • Bug Fixes
    • Updated regular expressions for LoRA target modules across multiple configuration files and documentation to match the correct model hierarchy.
    • Adjusted logic for setting batch packing options in training arguments for more predictable behavior.
    • Prevented appending null reference models during sequence parallel training.
  • Refactor
    • Unified and streamlined data loader creation and sampler management for training and evaluation.
    • Improved state dict loading for sharded FSDP2 models with enhanced dtype and contiguity handling.
    • Removed legacy monkeypatches for Gemma3 and Mllama Flash Attention 2, simplifying patch management.
    • Simplified LoRA kernel patching logic to better handle model attribute hierarchies.
    • Moved sequence parallel group initialization to training start for GRPO trainer.
  • Chores
    • Upgraded several Python dependencies and the modal package version in CI workflows.
    • Updated installation instructions for the cut-cross-entropy package.
    • Removed legacy and unused monkeypatch files and related documentation entries.
  • Tests
    • Increased loss threshold in knowledge distillation tests.
    • Restricted optimizer tests to run only with supported PyTorch versions.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2025

Walkthrough

This update removes monkeypatches for Gemma3 and Mllama attention, refactors dataloader construction in the AxolotlTrainer, adjusts LoRA target module regexes across configs and docs, updates dependencies and test thresholds, and improves test coverage for LoRA packing. It also refines multipack batch handling logic and patches for FSDP2 state dict loading. Additionally, it extends VLLM serve functionality with reasoning options and adds sequence parallelism support in GRPO training.

Changes

File(s) Change Summary
src/axolotl/monkeypatch/gemma3.py, src/axolotl/monkeypatch/attention/mllama.py Deleted monkeypatch files for Gemma3 forward and Mllama attention with Flash Attention 2.
src/axolotl/loaders/patch_manager.py Removed logic applying Gemma3 and Mllama monkeypatches.
src/axolotl/integrations/cut_cross_entropy/monkeypatch/mllama.py Removed docstring decorators from forward methods, keeping functional logic unchanged.
src/axolotl/core/trainers/base.py Refactored dataloader creation into unified _get_dataloader method; updated sampler handling and removed old dataloader methods.
src/axolotl/core/builders/causal.py Changed logic for setting multipack_real_batches with explicit None check.
src/axolotl/monkeypatch/lora_kernels.py Adjusted internal logic for accessing model layers in LoRA kernel patching to use model.model first.
src/axolotl/monkeypatch/accelerate/fsdp2.py Refactored fsdp2_load_full_state_dict for improved dtype and contiguity handling; removed set_state_dict_type patch.
scripts/cutcrossentropy_install.py Updated cut-cross-entropy install source repository URL and commit hash.
requirements.txt Bumped versions for liger-kernel, huggingface_hub, transformers, accelerate, datasets, deepspeed, trl, and hf_xet.
.github/workflows/multi-gpu-e2e.yml, .github/workflows/tests.yml Updated modal package version from 0.71.8 to 1.0.2.
_quarto.yml Removed monkeypatch.attention.mllama entry from config.
docs/multimodal.qmd, examples/llava/lora-7b.yaml, examples/gemma3/gemma-3-4b-qlora.yml,
examples/gemma3/gemma-3-4b-vision-qlora.yml, examples/llama-3-vision/lora-11b.yaml,
examples/mistral/mistral-small-3.1-24B-lora.yml, examples/pixtral/lora-12b.yml,
tests/e2e/test_llama_vision.py Updated lora_target_modules regex to use model.language_model.layers... instead of language_model.model.layers....
tests/e2e/integrations/test_kd.py Increased training loss threshold in assertion from 1.2 to 1.4.
tests/e2e/test_optimizers.py Added require_torch_2_6_0 decorator to test_came_pytorch.
tests/test_packed_dataset.py Added new test for LoRA packing verifying sampler and data collator types and batch shapes.
tests/e2e/multigpu/solo/test_grpo.py Added new test test_llama_lora_sp for GRPO with sequence parallelism and vLLM integration.
src/axolotl/core/trainers/grpo/__init__.py, src/axolotl/core/trainers/grpo/args.py, Added sequence_parallel_degree config field; updated GRPO trainer args and added sequence parallel group initialization in trainer.
src/axolotl/core/trainers/grpo/trainer.py Added SP group, rank, and world size initialization in AxolotlGRPOSequenceParallelTrainer.
src/axolotl/train.py Modified logic to append ref_model only if it exists and is truthy when sequence parallelism is enabled.
src/axolotl/cli/args.py, src/axolotl/utils/schemas/vllm.py, src/axolotl/cli/vllm_serve.py Added enable_reasoning and reasoning_parser fields for VLLM serve; patched vLLM worker to support reasoning options.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant AxolotlTrainer
    participant Dataset
    participant Sampler
    participant DataCollator
    participant Accelerator

    User->>AxolotlTrainer: Request train/eval dataloader
    AxolotlTrainer->>Dataset: Remove "length" column if present
    AxolotlTrainer->>AxolotlTrainer: Remove unused columns (based on mode and packing)
    AxolotlTrainer->>DataCollator: Select appropriate collator
    AxolotlTrainer->>Sampler: Create sampler (if needed)
    AxolotlTrainer->>AxolotlTrainer: Build dataloader params
    AxolotlTrainer->>Accelerator: Prepare dataloader
    Accelerator-->>AxolotlTrainer: Return prepared dataloader
    AxolotlTrainer-->>User: Return dataloader
Loading

Possibly related PRs

  • models.py -> loaders/ module refactor #2680: Refactored and split the large axolotl.utils.models module into multiple smaller modules under axolotl.loaders, directly relating to modularization and loader logic touched in this PR.
  • Fix: RL base feature parity #2133: Modularized and refactored trainer builder code by splitting into multiple builder modules, related to the structural refactoring in this PR.

Suggested reviewers

  • SalmanMohammadi
  • NanoCode012

Poem

🐇 Hopping through code with nimble feet,
Layers reordered, regex neat.
Monkeypatches bid goodbye,
Dataloaders now unify.
Tests grow stronger, patches lean,
In Axolotl's code serene.
🐰✨—A rabbit’s joy in every scene!

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2025

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/axolotl/core/trainers/base.py (1)

182-256: Consider breaking down this method for better maintainability.

While the consolidation of dataloader creation is beneficial, the method handles many concerns. Consider extracting some logic into helper methods for better readability and testing.

Potential extractions:

  • Dataset preprocessing logic (lines 193-206)
  • Persistent workers handling (lines 233-248)
  • Sampler configuration (lines 216-230)

This would make the main method more focused and easier to understand.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a925ee and 9b75528.

📒 Files selected for processing (2)
  • src/axolotl/core/trainers/base.py (6 hunks)
  • tests/e2e/integrations/test_kd.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/e2e/integrations/test_kd.py (1)
tests/conftest.py (1)
  • temp_dir (414-419)
src/axolotl/core/trainers/base.py (2)
src/axolotl/core/trainers/grpo/trainer.py (1)
  • _get_train_sampler (141-161)
src/axolotl/core/trainer_builder.py (4)
  • train_dataset (132-133)
  • train_dataset (136-137)
  • eval_dataset (140-141)
  • eval_dataset (144-145)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: pre-commit
  • GitHub Check: pre-commit
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
🔇 Additional comments (4)
tests/e2e/integrations/test_kd.py (1)

93-93: Threshold adjustment looks reasonable.

The increase from 1.2 to 1.4 aligns with the dependency upgrades that may affect training dynamics.

src/axolotl/core/trainers/base.py (3)

9-10: Import additions are appropriate.

The new imports support the refactored dataloader creation logic and improved type hints.

Also applies to: 22-23


116-131: Good improvement to method flexibility.

The optional dataset parameter and proper validation using has_length utility enhance the method's robustness and reusability.


291-300: Clean refactoring to use the consolidated dataloader method.

The changes properly leverage _get_dataloader while maintaining the necessary data collator management.

Comment thread src/axolotl/core/trainers/base.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/axolotl/core/trainers/base.py (1)

199-201: Simplify nested if statements.

The nested if statements can be combined into a single condition for better readability.

-if isinstance(train_dataset, datasets.Dataset):
-    if self.args.sample_packing and not self.args.pretraining:
-        train_dataset = train_dataset.remove_columns(["length"])
+if isinstance(train_dataset, datasets.Dataset) and self.args.sample_packing and not self.args.pretraining:
+    train_dataset = train_dataset.remove_columns(["length"])
🧰 Tools
🪛 Ruff (0.11.9)

199-200: Use a single if statement instead of nested if statements

(SIM102)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b75528 and 4c0a6bf.

📒 Files selected for processing (1)
  • src/axolotl/core/trainers/base.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/core/trainers/base.py (2)
src/axolotl/core/trainers/grpo/trainer.py (2)
  • _get_train_sampler (141-161)
  • get_train_dataloader (225-250)
src/axolotl/core/trainer_builder.py (4)
  • train_dataset (132-133)
  • train_dataset (136-137)
  • eval_dataset (140-141)
  • eval_dataset (144-145)
🪛 Ruff (0.11.9)
src/axolotl/core/trainers/base.py

23-23: transformers.trainer_utils.seed_worker imported but unused

Remove unused import: transformers.trainer_utils.seed_worker

(F401)


199-200: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: pre-commit
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: pre-commit
🔇 Additional comments (5)
src/axolotl/core/trainers/base.py (5)

116-152: LGTM! Good improvement to method flexibility.

The updated method signature with explicit dataset parameter and proper null checks improves the method's flexibility and error handling.


154-186: LGTM! Consistent with the train sampler changes.

The explicit dataset parameter and null check improve error handling and API consistency.


212-268: Excellent refactoring with improved error handling and optimization!

The changes include:

  • Proper error handling for missing eval dataset
  • Smart optimization to reuse persistent eval dataloaders
  • Correct data collator switching and restoration
  • Consistent use of parent class's _get_dataloader method

This implementation is more robust and maintainable.


270-278: LGTM! Good use of the utility function.

Using has_length from transformers is the proper way to check dataset validity.


280-317: Well-implemented benchmarking dataloader with proper error handling!

The rewritten method includes:

  • Clear documentation
  • Proper validation of the benchmark dataset
  • Correct data collator switching pattern
  • Appropriate handling of iterable vs non-iterable datasets

The implementation is thorough and maintainable.

Comment thread src/axolotl/core/trainers/base.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/axolotl/core/trainers/base.py (1)

23-23: Unused import issue resolved.

The seed_worker import is now properly used in line 236 within the _get_dataloader method, resolving the previously flagged unused import issue.

🧹 Nitpick comments (1)
src/axolotl/core/trainers/base.py (1)

261-264: Consider simplifying nested conditional logic.

The static analysis tool suggests combining these nested if statements for better readability.

-        if isinstance(train_dataset, datasets.Dataset):
-            if self.args.sample_packing and not self.args.pretraining:
-                train_dataset = train_dataset.remove_columns(["length"])
+        if (isinstance(train_dataset, datasets.Dataset) 
+            and self.args.sample_packing 
+            and not self.args.pretraining):
+            train_dataset = train_dataset.remove_columns(["length"])
🧰 Tools
🪛 Ruff (0.11.9)

261-262: Use a single if statement instead of nested if statements

(SIM102)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c0a6bf and 659d6f6.

📒 Files selected for processing (1)
  • src/axolotl/core/trainers/base.py (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/core/trainers/base.py (2)
src/axolotl/core/trainers/grpo/trainer.py (2)
  • _get_train_sampler (141-161)
  • get_train_dataloader (225-250)
src/axolotl/core/trainer_builder.py (4)
  • train_dataset (132-133)
  • train_dataset (136-137)
  • eval_dataset (140-141)
  • eval_dataset (144-145)
🪛 Ruff (0.11.9)
src/axolotl/core/trainers/base.py

261-262: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: pre-commit
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: pre-commit
🔇 Additional comments (10)
src/axolotl/core/trainers/base.py (10)

10-10: LGTM: Enhanced type annotations.

Good addition of more specific typing imports to improve type safety and code clarity.


15-15: LGTM: Added dataset availability check.

The import of is_datasets_available utility function enhances compatibility checking, which is used appropriately in the _get_dataloader method.


47-48: LGTM: Added necessary type annotations and caching attribute.

The explicit data_collator type annotation and _eval_dataloaders dictionary for caching persistent evaluation dataloaders are well-designed additions that improve type safety and performance.


118-156: LGTM: Improved sampler method with explicit dataset parameter.

The refactoring of _get_train_sampler to accept an explicit train_dataset parameter enhances flexibility and error handling. The method now properly handles None cases and maintains backward compatibility by defaulting to self.train_dataset.


158-190: LGTM: Enhanced eval sampler with required dataset parameter.

The _get_eval_sampler method correctly requires a non-None eval_dataset parameter, improving error handling and making dependencies explicit. The logic for handling multipack sampling is clear and consistent.


192-251: LGTM: Excellent unified dataloader creation method.

The new _get_dataloader method consolidates dataloader creation logic effectively. Key improvements include:

  • Unified parameter handling for batch size, samplers, and workers
  • Proper dataset compatibility checking with is_datasets_available()
  • Support for persistent workers and prefetch factors
  • Intelligent caching of eval dataloaders to prevent fork bombs
  • Clear separation of training vs non-training configurations

The implementation is well-structured and handles edge cases appropriately.


253-272: LGTM: Simplified training dataloader with improved error handling.

The refactored get_train_dataloader method properly validates the training dataset existence and uses the new unified _get_dataloader method. The dataset preprocessing for sample packing is handled correctly.

🧰 Tools
🪛 Ruff (0.11.9)

261-262: Use a single if statement instead of nested if statements

(SIM102)


274-336: LGTM: Comprehensive evaluation dataloader with persistent caching.

The refactored get_eval_dataloader method includes excellent improvements:

  • Proper validation of evaluation dataset availability
  • Intelligent reuse of cached persistent dataloaders to prevent resource waste
  • Correct handling of data collator switching for different evaluation modes
  • Proper restoration of original data collators after temporary switches

The logic for handling different sample packing scenarios is comprehensive and correct.


338-346: LGTM: Improved bench sampler with proper validation.

The updated _get_bench_sampler method now uses has_length utility for proper dataset validation, which is more robust than basic None checking.


348-401: LGTM: Comprehensive benchmarking dataloader implementation.

The completely rewritten get_bench_dataloader method is well-structured with:

  • Proper dataset validation with clear error messages
  • Correct data collator management with save/restore pattern
  • Comprehensive parameter handling including conditional parameters
  • Proper sampler integration for different dataset types
  • Good documentation with clear method signature

The implementation follows established patterns from the other dataloader methods and handles edge cases appropriately.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/axolotl/core/trainers/base.py (1)

23-23: The unused import concern is now resolved.

The seed_worker import is actually used on line 236 in the worker initialization function, so the previous review comment about removing this unused import is no longer applicable.

🧹 Nitpick comments (1)
src/axolotl/core/trainers/base.py (1)

202-206: Remove redundant batch_size assignment.

The batch_size assignment is redundant since self.args.sample_packing is always True in this elif block (it's part of the condition on line 199). This will always assign batch_size to itself.

-        batch_size = (
-            batch_size
-            if self.args.sample_packing
-            else self.args.per_device_eval_batch_size
-        )

Remove this redundant assignment since the condition will always be true in this context.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d68e4d and 76eaffc.

📒 Files selected for processing (2)
  • src/axolotl/core/trainers/base.py (4 hunks)
  • tests/e2e/test_llama_vision.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: pre-commit
  • GitHub Check: pre-commit
🔇 Additional comments (8)
tests/e2e/test_llama_vision.py (1)

89-89: LGTM! Consistent module path update in multimodal test.

This change matches the update in the text-only test method and maintains consistency across both test configurations.

src/axolotl/core/trainers/base.py (7)

116-145: Good refactoring to make the sampler method more flexible.

The updated _get_train_sampler method now accepts an optional train_dataset parameter, making it more flexible and allowing explicit dataset passing to the multipack sampler. The logic correctly handles different sampling strategies.


147-173: Improved evaluation sampler with explicit dataset parameter.

The _get_eval_sampler method now properly accepts an eval_dataset parameter and uses it directly instead of relying on internal state. This makes the method more predictable and testable.


186-186: Clear data collator selection logic.

The conditional assignment of data collator based on training mode is straightforward and correct.


223-230: Well-implemented sampler handling logic.

The method correctly handles both regular samplers and batch samplers, with proper parameter exclusion when using batch samplers. The mutually exclusive handling of batch_size and batch_sampler is implemented correctly.


234-239: Proper worker initialization for training.

The worker initialization with seeding is correctly applied only for training dataloaders, using the appropriate parameters for distributed training.


248-256: Good implementation of persistent dataloader storage.

The logic for storing non-prepared dataloaders for evaluation with persistent workers is well-implemented, handling both initialization and updates of the _eval_dataloaders dictionary properly.


175-258: Excellent consolidation of dataloader creation logic.

This new _get_dataloader method successfully consolidates the previously separate dataloader creation logic into a single, flexible method. The method handles:

  • Appropriate data collator selection
  • Dataset column management based on training mode and sample packing
  • Comprehensive dataloader parameter configuration
  • Proper sampler integration
  • Persistent dataloader storage for evaluation

The refactoring improves code maintainability and reduces duplication while supporting advanced features like sample packing.

Comment thread tests/e2e/test_llama_vision.py Outdated
"lora_alpha": 16,
"lora_dropout": 0.05,
"lora_target_modules": r"language_model.model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj",
"lora_target_modules": r"language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

LGTM! Module path update aligns with codebase conventions.

The removal of .model from the LoRA target modules path correctly aligns with the updated module structure.


🏁 Script executed:

#!/bin/bash
# Description: Verify the correct module path structure for Llama Vision models
# Expected: Find references to "language_model.layers" without ".model" component

echo "Checking for LoRA target module patterns in the codebase..."
rg -A 3 -B 3 "lora_target_modules.*language_model\.layers" --type py

echo -e "\nChecking for any remaining references to old pattern..."
rg "language_model\.model\.layers" --type py

Length of output: 1589


Inconsistent LoRA module path in monkeypatch

The end-to-end tests now target language_model.layers…, but src/axolotl/monkeypatch/lora_kernels.py still uses model.language_model.model.layers. This mismatch will break LoRA injection at runtime.

Please update the monkeypatch to match the new module path:

• File: src/axolotl/monkeypatch/lora_kernels.py
Change:

- layers = model.language_model.model.layers
+ layers = model.language_model.layers

Ensure any other occurrences of language_model.model.layers are removed or updated to language_model.layers.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/axolotl/monkeypatch/lora_kernels.py, locate all occurrences of the module
path language_model.model.layers and update them to language_model.layers to
match the new module structure used in the tests and codebase. Ensure no
references to the old path remain to prevent runtime errors in LoRA injection.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f76af1 and 39ecef3.

📒 Files selected for processing (3)
  • docs/multimodal.qmd (1 hunks)
  • examples/llama-3-vision/lora-11b.yaml (1 hunks)
  • tests/e2e/test_llama_vision.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/llama-3-vision/lora-11b.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/e2e/test_llama_vision.py

Comment thread docs/multimodal.qmd
# load_in_8bit: true
adapter: lora
lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

LGTM! LoRA target module path correction looks good.

The updated regex pattern correctly targets the model hierarchy with model.language_model.layers instead of the previous pattern. This change aligns with the broader codebase updates mentioned in the PR summary.

Please verify that this pattern matches the actual model architecture by running:


🏁 Script executed:

#!/bin/bash
# Description: Verify the LoRA target module pattern is consistent across the codebase
# Expected: All lora_target_modules should use the same pattern format

# Search for all lora_target_modules configurations
rg -A 2 -B 2 "lora_target_modules.*model\.language_model\.layers"

# Also check for any remaining old patterns that might need updating
rg -A 2 -B 2 "lora_target_modules.*language_model\.model\.layers"

Length of output: 3449


Fix outdated LoRA module patterns in example configs

The update to use model.language_model.layers is correct, but several example YAMLs still reference the old pattern. Please update these to maintain consistency:

• examples/pixtral/lora-12b.yml
• examples/mistral/mistral-small-3.1-24B-lora.yml
• examples/llava/lora-7b.yaml
• examples/gemma3/gemma-3-4b-vision-qlora.yml
• examples/gemma3/gemma-3-4b-qlora.yml

Suggested change in each file (replace the old regex):

- lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
+ lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
- lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
+ lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
🤖 Prompt for AI Agents
In docs/multimodal.qmd at line 46, the lora_target_modules regex pattern has
been correctly updated to use model.language_model.layers. To maintain
consistency across the codebase, update the same regex pattern in the example
YAML configuration files listed: examples/pixtral/lora-12b.yml,
examples/mistral/mistral-small-3.1-24B-lora.yml, examples/llava/lora-7b.yaml,
examples/gemma3/gemma-3-4b-vision-qlora.yml, and
examples/gemma3/gemma-3-4b-qlora.yml. Replace the old lora_target_modules regex
pattern with the new one matching model.language_model.layers as shown in
docs/multimodal.qmd line 46.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
examples/gemma3/gemma-3-4b-vision-qlora.yml (1)

33-33: Nit: Anchor and escape your regex
To ensure exact matching and prevent unintended partial matches, consider anchoring and escaping:

-lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
+lora_target_modules: '^model\\.language_model\\.layers\\.\\d+\\.(mlp|cross_attn|self_attn)\\.(up|down|gate|q|k|v|o)_proj$'
examples/gemma3/gemma-3-4b-qlora.yml (1)

31-31: Nit: Anchor and escape your regex
For precision, wrap with anchors and escape dots:

-lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
+lora_target_modules: '^model\\.language_model\\.layers\\.\\d+\\.(mlp|cross_attn|self_attn)\\.(up|down|gate|q|k|v|o)_proj$'
examples/pixtral/lora-12b.yml (1)

28-28: Nit: Anchor and escape regex for precision
Consider adding anchors and escaping dots:

-lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
+lora_target_modules: '^model\\.language_model\\.layers\\.\\d+\\.(mlp|cross_attn|self_attn)\\.(up|down|gate|q|k|v|o)_proj$'
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39ecef3 and 3996ab6.

📒 Files selected for processing (5)
  • examples/gemma3/gemma-3-4b-qlora.yml (1 hunks)
  • examples/gemma3/gemma-3-4b-vision-qlora.yml (1 hunks)
  • examples/llava/lora-7b.yaml (1 hunks)
  • examples/mistral/mistral-small-3.1-24B-lora.yml (1 hunks)
  • examples/pixtral/lora-12b.yml (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • examples/llava/lora-7b.yaml
  • examples/mistral/mistral-small-3.1-24B-lora.yml
🔇 Additional comments (5)
examples/gemma3/gemma-3-4b-vision-qlora.yml (1)

33-33: Regex prefix update approved
The lora_target_modules pattern prefix has been updated to model.language_model, correctly reflecting the current module hierarchy.

examples/gemma3/gemma-3-4b-qlora.yml (2)

31-31: Regex prefix update approved
The lora_target_modules regex now correctly uses model.language_model as the prefix, aligning with the code’s attribute path.


31-31:

Details

✅ Verification successful

Verify no residual old patterns
Run a global search to catch any configs still referencing language_model.model:


🏁 Script executed:

rg "lora_target_modules" | grep -n "language_model\\.model"

Length of output: 58


#!/bin/bash
# Search for any leftover references to the old pattern
rg -n "language_model\\.model"

No obsolete language_model.model references detected
A global search for language_model.model across the repository returned no matches, confirming all lora_target_modules patterns have been updated.

examples/pixtral/lora-12b.yml (2)

28-28: Regex prefix update approved
The updated lora_target_modules prefix now matches model.language_model, consistent with other example configurations.


28-28:

Details

✅ Verification successful

Verify consistency across configs
Search for any remaining outdated patterns:


🏁 Script executed:

rg "lora_target_modules" | grep -n "language_model\\.model"

Length of output: 58


All lora_target_modules patterns are consistent across configs
A search for any stray language_model.model occurrences within lora_target_modules yielded no results, so there are no outdated patterns to update.

lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify no residual old patterns
Ensure consistency by searching for any configs still using the old language_model.model prefix:

Let me know if you’d like a sed-based script to batch-update remaining files.


🏁 Script executed:

rg "lora_target_modules" -A1 | grep -B1 "language_model\\.model"

Length of output: 63


🏁 Script executed:

#!/bin/bash
# Search the entire repository for any occurrences of the old prefix
rg -n "language_model\.model"

Length of output: 126


Residual old prefix detected in lora_kernels.py
The search revealed one remaining use of the old language_model.model path:

  • File: src/axolotl/monkeypatch/lora_kernels.py
    Line 346:
    - layers = model.language_model.model.layers
    + layers = model.language_model.layers

Please update this occurrence (and any similar references) to remove the extra .model segment so it aligns with the new target-module pattern.

🤖 Prompt for AI Agents
In examples/gemma3/gemma-3-4b-vision-qlora.yml at line 33, the
lora_target_modules pattern is updated correctly, but there is a residual old
prefix "language_model.model" used in src/axolotl/monkeypatch/lora_kernels.py at
line 346. Locate this line and any similar references in that file and remove
the extra ".model" segment from the path to ensure consistency with the new
pattern. This will align all references to use the updated module path format.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 30, 2025

@github-actions github-actions Bot temporarily deployed to preview May 30, 2025 11:59 Inactive
@salmanmohammadi
Copy link
Copy Markdown
Contributor

I also think this file can be updated. Let me know if you'd like a hand with this.

@NanoCode012
Copy link
Copy Markdown
Collaborator

NanoCode012 commented May 31, 2025

@salmanmohammadi , what needs to be changed in the linked file?

The current error seems to be due to our linked test loads from https://huggingface.co/JackFram/llama-68m which is pytorch bin model https://github.com/axolotl-ai-cloud/axolotl/actions/runs/15346236028/job/43186938948?pr=2735 (Is this our sign to drop py2.5 support?)

@github-actions github-actions Bot temporarily deployed to preview June 2, 2025 09:59 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e0c791 and 5962849.

📒 Files selected for processing (3)
  • _quarto.yml (0 hunks)
  • src/axolotl/core/builders/causal.py (1 hunks)
  • tests/e2e/test_optimizers.py (2 hunks)
💤 Files with no reviewable changes (1)
  • _quarto.yml
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/e2e/test_optimizers.py (1)
tests/e2e/utils.py (4)
  • check_model_output_exists (150-171)
  • require_torch_2_5_1 (56-65)
  • require_torch_2_6_0 (68-77)
  • with_temp_dir (19-31)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
🔇 Additional comments (2)
tests/e2e/test_optimizers.py (2)

15-20: LGTM: Import addition aligns with test requirements.

The addition of require_torch_2_6_0 to the imports is necessary for the decorator usage in the test_came_pytorch method.


209-209: LGTM: Appropriate version gating for compatibility.

Adding the @require_torch_2_6_0 decorator to the test_came_pytorch method is a good solution that aligns with the PR's dependency bump objectives. This ensures the test only runs when PyTorch 2.6.0+ is available, which is particularly relevant given:

  1. The test uses the "JackFram/llama-68m" model and "came_pytorch" optimizer
  2. The PR comments mentioned errors related to this specific model
  3. The pattern follows existing version requirements used elsewhere in the codebase

This approach maintains test coverage while preventing compatibility issues on older PyTorch versions.

Comment on lines 246 to 250
training_arguments_kwargs["multipack_real_batches"] = (
not self.cfg.flash_attention or self.cfg.multipack_real_batches
self.cfg.multipack_real_batches
if self.cfg.multipack_real_batches is not None
else not self.cfg.flash_attention
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Breaking change: Verify the new multipack_real_batches logic is intentional

This change modifies the precedence for determining multipack_real_batches and could be a breaking change:

Previous behavior: not self.cfg.flash_attention or self.cfg.multipack_real_batches

  • When flash_attention=False, result was always True regardless of multipack_real_batches setting
  • Only respected explicit multipack_real_batches when flash_attention=True

New behavior: Explicit configuration takes priority over derived behavior

  • If multipack_real_batches is set, use that value
  • If multipack_real_batches is None, fall back to not flash_attention

While the new logic is more intuitive and respects user configuration, users with configs like:

flash_attention: false
multipack_real_batches: false

will now get multipack_real_batches=False instead of True.


🏁 Script executed:

#!/bin/bash
# Verify impact of multipack_real_batches logic change
echo "Searching for configurations that might be affected by this change..."

# Look for configs with both flash_attention and multipack_real_batches settings
rg -A 5 -B 5 "flash_attention.*false|multipack_real_batches.*false" --type yaml

# Search for any documentation or comments about this behavior
rg -A 10 -B 5 "multipack_real_batches.*flash_attention|flash_attention.*multipack_real_batches"

# Look for related test files that might need updates
fd "test.*multipack|multipack.*test" --type f --exec cat {}

Length of output: 16809


Document and test explicit multipack_real_batches override with flash_attention=False

The change in src/axolotl/core/builders/causal.py (246–250) only affects users who explicitly set

flash_attention: false  
multipack_real_batches: false

—default behavior (no multipack_real_batches key) remains unchanged. To avoid surprises:

• Update the configuration reference or CLI help (TrainerCliArgs) to call out this override priority.
• Add a unit or E2E test for the case where flash_attention=False and multipack_real_batches=false to ensure users see the intended behavior.
• Consider a note in CHANGELOG or upgrade guide highlighting this breaking change for explicit configs.

🤖 Prompt for AI Agents
In src/axolotl/core/builders/causal.py around lines 246 to 250, the logic for
setting multipack_real_batches now prioritizes the explicit configuration over
the derived value from flash_attention, which is a breaking change for users who
set flash_attention to false and multipack_real_batches to false. To fix this,
update the TrainerCliArgs configuration reference or CLI help to clearly
document that multipack_real_batches overrides flash_attention when explicitly
set. Additionally, add a unit or end-to-end test covering the scenario where
flash_attention is false and multipack_real_batches is false to verify the new
behavior. Finally, include a note in the CHANGELOG or upgrade guide to inform
users about this breaking change.

@github-actions github-actions Bot temporarily deployed to preview June 2, 2025 12:52 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/axolotl/monkeypatch/accelerate/fsdp2.py (1)

60-97: Consider refactoring to reduce code duplication.

The tensor distribution logic is correct, but the casting and contiguity handling is duplicated between the main process (lines 70-77) and other ranks (lines 89-96). While not critical, this could be simplified.

Consider extracting the common pattern into a helper:

 for param_name in param_names:
     mesh = meta_sharded_sd[param_name].device_mesh
     if accelerator.is_main_process:
         full_param = full_sd[param_name].detach().cuda()
         dist.broadcast(full_param, src=0, group=mesh.get_group())
         sharded_tensor = distribute_tensor(
             full_param, mesh, sharded_sd[param_name].placements
         )
-        to_contiguous, casting_dtype = _infer_parameter_dtype(
-            model,
-            param_name,
-            full_param,
-        )
-        sharded_tensor = _cast_and_contiguous(
-            sharded_tensor, to_contiguous, casting_dtype
-        )
-        sharded_sd[param_name] = sharded_tensor
     else:
         full_tensor = torch.empty(
             sharded_sd[param_name].size(),
             device="cuda",
             dtype=sharded_sd[param_name].dtype,
         )
         dist.broadcast(full_tensor, src=0, group=mesh.get_group())
         sharded_tensor = distribute_tensor(
             full_tensor, mesh, sharded_sd[param_name].placements
         )
-        to_contiguous, casting_dtype = _infer_parameter_dtype(
-            model,
-            param_name,
-            full_tensor,
-        )
-        sharded_tensor = _cast_and_contiguous(
-            sharded_tensor, to_contiguous, casting_dtype
-        )
-        sharded_sd[param_name] = sharded_tensor
+    
+    # Apply dtype casting and contiguity after distribution
+    to_contiguous, casting_dtype = _infer_parameter_dtype(
+        model,
+        param_name,
+        full_param if accelerator.is_main_process else full_tensor,
+    )
+    sharded_tensor = _cast_and_contiguous(
+        sharded_tensor, to_contiguous, casting_dtype
+    )
+    sharded_sd[param_name] = sharded_tensor
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5962849 and 2e12544.

📒 Files selected for processing (1)
  • src/axolotl/monkeypatch/accelerate/fsdp2.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
🔇 Additional comments (5)
src/axolotl/monkeypatch/accelerate/fsdp2.py (5)

21-23: Good documentation update!

The clarification that the model is expected to be on meta device to prevent VRAM spikes is helpful for users of this function.


28-31: LGTM!

The initialization logic correctly retrieves the meta device state dict and prepares for sharded tensor distribution.


33-52: Well-designed helper function with good LoRA support!

The function properly handles both regular parameters and LoRA parameters that aren't standard PyTorch parameters. The float8 dtype checking and casting logic is appropriate.


99-101: Correct use of assign=True for meta device parameters!

The assign=True parameter is essential here since the model parameters are on meta device. This ensures the state dict tensors are assigned directly rather than copied.


194-211: LGTM! Simplified patching aligns with dependency updates.

The removal of set_state_dict_type patching and focus on just the essential patches (fsdp2_load_full_state_dict and get_state_dict) is appropriate for the updated accelerate dependencies.

🧰 Tools
🪛 Ruff (0.11.9)

199-203: Do not call setattr with a constant attribute value. It is not any safer than normal property access.

Replace setattr with assignment

(B010)

Comment thread src/axolotl/monkeypatch/accelerate/fsdp2.py
@github-actions github-actions Bot temporarily deployed to preview June 3, 2025 02:40 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/e2e/multigpu/solo/test_grpo.py (1)

268-359: Consider improving test consistency and parameterization.

The new test test_llama_lora_sp effectively exercises the sequence parallelism functionality. However, there are some inconsistencies with the existing test pattern:

  1. Missing parameterization: Unlike test_llama_dora which tests with [1, 2] GPUs, this test hardcodes 2 processes (line 348). Consider adding parameterization for consistency.

  2. Configuration differences: The test removes DoRA configuration (peft_use_dora) and adds sequence_parallel_degree: 2, which properly exercises the new functionality from the main file.

Consider applying this diff to maintain consistency with the existing test pattern:

+    @pytest.mark.parametrize(
+        "num_gpus",
+        [2],  # Only test with 2 GPUs since sequence_parallel_degree is 2
+    )
     @require_vllm
-    def test_llama_lora_sp(self, temp_dir):
+    def test_llama_lora_sp(self, temp_dir, num_gpus):
         # ... configuration code ...
                     "--num-processes",
-                    str(2),
+                    str(num_gpus),

Alternatively, if sequence parallelism specifically requires 2 processes, document this requirement in a comment.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e12544 and e75864e.

📒 Files selected for processing (2)
  • src/axolotl/core/trainers/grpo/__init__.py (2 hunks)
  • tests/e2e/multigpu/solo/test_grpo.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/axolotl/core/trainers/grpo/__init__.py (1)
src/axolotl/utils/schemas/trl.py (1)
  • TRLConfig (6-163)
tests/e2e/multigpu/solo/test_grpo.py (3)
tests/e2e/utils.py (1)
  • require_vllm (92-107)
tests/conftest.py (1)
  • temp_dir (414-419)
src/axolotl/utils/dict.py (1)
  • DictDefault (6-38)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
🔇 Additional comments (2)
src/axolotl/core/trainers/grpo/__init__.py (1)

72-73: LGTM! Sequence parallel degree support added correctly.

The logic correctly sets the sequence_parallel_degree parameter when it's greater than 1, which aligns with the test case that uses this feature.

tests/e2e/multigpu/solo/test_grpo.py (1)

265-267: LGTM! Minor formatting improvement.

The parentheses around recursive_kill(vllm_process) are unnecessary but don't affect functionality.

Comment on lines +129 to +131
trl: TRLConfig = cfg.trl
if trl.reward_funcs and isinstance(trl.reward_funcs, list):
for reward_func in trl.reward_funcs:
trainer_kwargs["reward_fn"] = cls.get_reward_func(reward_func)
elif trl.reward_funcs and isinstance(trl.reward_funcs, str):
trainer_kwargs["reward_fn"] = cls.get_reward_func(trl.reward_funcs)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix logic error in reward function handling.

There are two issues with this implementation:

  1. Missing null check: Line 129 accesses cfg.trl without verifying it exists, which could cause an AttributeError.
  2. Logic error: When trl.reward_funcs is a list, the loop overwrites trainer_kwargs["reward_fn"] in each iteration, meaning only the last reward function is actually set.

Apply this diff to fix both issues:

-        trl: TRLConfig = cfg.trl
-        if trl.reward_funcs and isinstance(trl.reward_funcs, list):
-            for reward_func in trl.reward_funcs:
-                trainer_kwargs["reward_fn"] = cls.get_reward_func(reward_func)
-        elif trl.reward_funcs and isinstance(trl.reward_funcs, str):
-            trainer_kwargs["reward_fn"] = cls.get_reward_func(trl.reward_funcs)
+        if cfg.trl:
+            trl: TRLConfig = cfg.trl
+            if trl.reward_funcs and isinstance(trl.reward_funcs, list):
+                # Handle multiple reward functions - this may need different logic
+                # depending on how the trainer expects to receive multiple functions
+                trainer_kwargs["reward_fn"] = [cls.get_reward_func(reward_func) for reward_func in trl.reward_funcs]
+            elif trl.reward_funcs and isinstance(trl.reward_funcs, str):
+                trainer_kwargs["reward_fn"] = cls.get_reward_func(trl.reward_funcs)

Note: Please verify how the trainer expects to receive multiple reward functions - as a list or if only one function should be set.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
trl: TRLConfig = cfg.trl
if trl.reward_funcs and isinstance(trl.reward_funcs, list):
for reward_func in trl.reward_funcs:
trainer_kwargs["reward_fn"] = cls.get_reward_func(reward_func)
elif trl.reward_funcs and isinstance(trl.reward_funcs, str):
trainer_kwargs["reward_fn"] = cls.get_reward_func(trl.reward_funcs)
if cfg.trl:
trl: TRLConfig = cfg.trl
if trl.reward_funcs and isinstance(trl.reward_funcs, list):
# Handle multiple reward functions - this may need different logic
# depending on how the trainer expects to receive multiple functions
trainer_kwargs["reward_fn"] = [
cls.get_reward_func(reward_func)
for reward_func in trl.reward_funcs
]
elif trl.reward_funcs and isinstance(trl.reward_funcs, str):
trainer_kwargs["reward_fn"] = cls.get_reward_func(trl.reward_funcs)
🤖 Prompt for AI Agents
In src/axolotl/core/trainers/grpo/__init__.py around lines 129 to 135, first add
a check to ensure cfg.trl exists before accessing it to avoid AttributeError.
Then, fix the logic for handling trl.reward_funcs when it is a list by
collecting all reward functions into a list and assigning that list to
trainer_kwargs["reward_fn"] instead of overwriting it in each loop iteration.
Confirm whether the trainer expects a list of reward functions or a single
function and adjust accordingly.

@github-actions github-actions Bot temporarily deployed to preview June 3, 2025 02:46 Inactive
@github-actions github-actions Bot temporarily deployed to preview June 3, 2025 02:54 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/axolotl/core/trainers/grpo/args.py (1)

16-16: Good addition, but consider adding documentation.

The new sequence_parallel_degree field is well-implemented with appropriate type annotation and default value. However, consider adding a docstring or inline comment to explain the purpose and expected values for this configuration field.

-    sequence_parallel_degree: int | None = None
+    sequence_parallel_degree: int | None = None  # Degree of sequence parallelism for distributed training

Alternatively, you could add a docstring to the field or expand the class docstring to include information about all configuration options.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e75864e and 5a8bf5b.

📒 Files selected for processing (2)
  • src/axolotl/core/trainers/grpo/__init__.py (3 hunks)
  • src/axolotl/core/trainers/grpo/args.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/axolotl/core/trainers/grpo/init.py
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)

@github-actions github-actions Bot temporarily deployed to preview June 3, 2025 03:12 Inactive
@github-actions github-actions Bot temporarily deployed to preview June 3, 2025 03:32 Inactive
@github-actions github-actions Bot temporarily deployed to preview June 3, 2025 03:35 Inactive
Comment on lines +139 to +154
self.sp_group = None
self.rank = dist.get_rank()
self.world_size = dist.get_world_size()
self.local_rank = 0
self.local_world_size = 1

def train(self, *args, **kwargs):
# Initialize the SP group
self.sp_group = get_ring_attn_group()
self.rank = dist.get_rank()
self.world_size = dist.get_world_size()
self.local_rank = dist.get_rank(group=self.sp_group)
self.local_world_size = dist.get_world_size(group=self.sp_group)

return super().train(*args, **kwargs)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djsaunde Does this seem sane? we talked about this the other day since we don't have ring attention groups initialized until the train is called with the context manager.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems fine, though you don't need to re-set self.rank or self.world_size.

@github-actions github-actions Bot temporarily deployed to preview June 4, 2025 20:36 Inactive
Comment thread requirements.txt Outdated
accelerate==1.7.0
datasets==3.6.0
deepspeed>=0.17.0
trl==0.18.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRL released a patch https://github.com/huggingface/trl/releases/tag/v0.18.1 but not sure if there's anything that we depend on

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/test_packed_dataset.py (1)

76-154: Clarify the sequence length vs batch shape discrepancy.

The test configures sequence_len: 1024 but asserts that batch shapes are (1, 8192). Please clarify:

  1. Is this 8x multiplier intentional for testing sample packing behavior?
  2. Does the multipack sampler concatenate sequences to achieve this shape?
  3. Should the test include a comment explaining this expected behavior?

The test structure and LoRA configuration look good otherwise.

src/axolotl/core/trainers/base.py (1)

175-260: Consider breaking down the large method for better maintainability.

The _get_dataloader method is quite comprehensive but could benefit from being broken into smaller, focused helper methods for better readability and testability.

Consider extracting logical chunks into helper methods such as:

  • _preprocess_dataset_for_dataloader(dataset, is_training, description)
  • _build_dataloader_params(batch_size, data_collator, sampler_fn, dataset, is_training)
  • _cache_eval_dataloader_if_needed(dataloader, dataloader_key)

This would make the main method more focused and easier to test individual components.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a8bf5b and 82611e7.

📒 Files selected for processing (30)
  • .github/workflows/multi-gpu-e2e.yml (1 hunks)
  • .github/workflows/tests.yml (3 hunks)
  • _quarto.yml (0 hunks)
  • docs/multimodal.qmd (1 hunks)
  • examples/gemma3/gemma-3-4b-qlora.yml (1 hunks)
  • examples/gemma3/gemma-3-4b-vision-qlora.yml (1 hunks)
  • examples/llama-3-vision/lora-11b.yaml (1 hunks)
  • examples/llava/lora-7b.yaml (1 hunks)
  • examples/mistral/mistral-small-3.1-24B-lora.yml (1 hunks)
  • examples/pixtral/lora-12b.yml (1 hunks)
  • requirements.txt (1 hunks)
  • scripts/cutcrossentropy_install.py (1 hunks)
  • setup.py (1 hunks)
  • src/axolotl/core/builders/causal.py (1 hunks)
  • src/axolotl/core/trainers/base.py (4 hunks)
  • src/axolotl/core/trainers/grpo/__init__.py (3 hunks)
  • src/axolotl/core/trainers/grpo/args.py (1 hunks)
  • src/axolotl/core/trainers/grpo/trainer.py (1 hunks)
  • src/axolotl/integrations/cut_cross_entropy/monkeypatch/mllama.py (0 hunks)
  • src/axolotl/loaders/patch_manager.py (0 hunks)
  • src/axolotl/monkeypatch/accelerate/fsdp2.py (2 hunks)
  • src/axolotl/monkeypatch/attention/mllama.py (0 hunks)
  • src/axolotl/monkeypatch/gemma3.py (0 hunks)
  • src/axolotl/monkeypatch/lora_kernels.py (1 hunks)
  • src/axolotl/train.py (1 hunks)
  • tests/e2e/integrations/test_kd.py (1 hunks)
  • tests/e2e/multigpu/solo/test_grpo.py (1 hunks)
  • tests/e2e/test_llama_vision.py (2 hunks)
  • tests/e2e/test_optimizers.py (2 hunks)
  • tests/test_packed_dataset.py (2 hunks)
💤 Files with no reviewable changes (5)
  • _quarto.yml
  • src/axolotl/loaders/patch_manager.py
  • src/axolotl/integrations/cut_cross_entropy/monkeypatch/mllama.py
  • src/axolotl/monkeypatch/attention/mllama.py
  • src/axolotl/monkeypatch/gemma3.py
✅ Files skipped from review due to trivial changes (3)
  • setup.py
  • .github/workflows/multi-gpu-e2e.yml
  • docs/multimodal.qmd
🚧 Files skipped from review as they are similar to previous changes (17)
  • requirements.txt
  • tests/e2e/integrations/test_kd.py
  • src/axolotl/core/trainers/grpo/args.py
  • tests/e2e/test_llama_vision.py
  • scripts/cutcrossentropy_install.py
  • examples/gemma3/gemma-3-4b-qlora.yml
  • examples/gemma3/gemma-3-4b-vision-qlora.yml
  • .github/workflows/tests.yml
  • examples/pixtral/lora-12b.yml
  • examples/llama-3-vision/lora-11b.yaml
  • src/axolotl/core/builders/causal.py
  • examples/mistral/mistral-small-3.1-24B-lora.yml
  • tests/e2e/test_optimizers.py
  • examples/llava/lora-7b.yaml
  • src/axolotl/core/trainers/grpo/init.py
  • src/axolotl/monkeypatch/lora_kernels.py
  • tests/e2e/multigpu/solo/test_grpo.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/core/trainers/base.py (1)
src/axolotl/core/trainers/grpo/trainer.py (1)
  • _get_train_sampler (155-175)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
🔇 Additional comments (12)
src/axolotl/train.py (1)

207-207: LGTM! Good defensive programming practice.

The additional truthiness check prevents adding None or falsy reference models to the models list, which could cause issues in the SequenceParallelContextManager. This change aligns well with the sequence parallelism improvements.

tests/test_packed_dataset.py (1)

9-18: LGTM! Appropriate imports for the new test functionality.

The imported modules correctly support the new LoRA packing test with CLI args, dataset loading, trainer setup, and configuration utilities.

src/axolotl/core/trainers/grpo/trainer.py (1)

139-154: LGTM! Good architectural improvement for sequence parallel initialization.

Moving the sequence parallel group initialization from constructor to the train method ensures that:

  1. The SP group is available when training starts (not during construction)
  2. Distributed training setup is complete before initialization
  3. Fresh rank and world size information is obtained at training time

The initialization logic correctly updates both global and local (SP group) rank/world size information before delegating to the superclass.

src/axolotl/monkeypatch/accelerate/fsdp2.py (3)

33-58: Well-designed helper functions for parameter handling.

The _infer_parameter_dtype and _cast_and_contiguous helper functions provide clean abstractions for:

  1. Dtype inference with special handling for float8_e4m3fn and floating-point types
  2. LoRA parameter support through submodule attribute access
  3. Contiguity checking and casting operations

The separation of concerns makes the code more maintainable and testable.


70-77: Improved tensor casting and contiguity handling.

The refactored code properly addresses tensor dtype casting and contiguity requirements using the helper functions. This addresses the previous review concern about the additional .contiguous() call by making the contiguity handling explicit and conditional based on the original parameter properties.

Also applies to: 89-96


99-101: Correct use of assign=True for meta device parameters.

Using assign=True in load_state_dict is appropriate for meta device parameters, as it allows direct assignment without device/dtype mismatches that would occur with normal loading.

src/axolotl/core/trainers/base.py (6)

9-10: Import additions look appropriate.

The new imports partial, wraps, Callable, Literal, and Optional are all used in the refactored code and support the new functionality.


23-23: The seed_worker import is now correctly used.

This addresses the previous review comment about unused imports - seed_worker is now used in the _get_dataloader method on lines 237-240.


116-142: Good refactor with explicit dataset parameter.

The addition of the optional train_dataset parameter and its usage on line 142 makes the sampler creation more explicit and testable. The method correctly handles curriculum sampling and sample packing cases.


147-171: Eval sampler logic is consistent and well-structured.

The evaluation sampler method follows the same pattern as the training sampler and correctly handles the multipack evaluation logic based on the configuration flags.


175-213: The dataloader preprocessing logic resolves previous conditional issues.

The new conditional structure for dataset preprocessing is much cleaner than the previous implementation and correctly handles the various combinations of training/evaluation modes with sample packing. This addresses the contradictory logic flagged in previous reviews.


214-260: Comprehensive dataloader parameter construction with good caching support.

The unified dataloader parameter construction handles all the necessary configurations including batch sampling, worker settings, and sampler integration. The addition of evaluation dataloader caching (lines 252-257) for persistent workers is a good optimization.

Comment on lines +202 to +206
batch_size = (
batch_size
if self.args.sample_packing
else self.args.per_device_eval_batch_size
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix redundant conditional in batch size assignment.

The ternary condition on lines 203-205 is redundant because we're already in a branch where self.args.sample_packing is True, so the condition will always evaluate to the first case.

-            batch_size = (
-                batch_size
-                if self.args.sample_packing
-                else self.args.per_device_eval_batch_size
-            )
+            # batch_size remains unchanged since sample_packing is already True

Alternatively, if this was intended to set a different batch size for evaluation, clarify the logic:

-            batch_size = (
-                batch_size
-                if self.args.sample_packing
-                else self.args.per_device_eval_batch_size
-            )
+            # Use the passed batch_size for evaluation with sample packing
+            pass

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/axolotl/core/trainers/base.py around lines 202 to 206, the batch_size
assignment uses a redundant conditional checking self.args.sample_packing, which
is always True in this context. Simplify the assignment by directly setting
batch_size to the value corresponding to the sample_packing condition without
the ternary operator, or clarify and adjust the logic if different batch sizes
are intended for evaluation scenarios.

Copy link
Copy Markdown
Contributor

@salmanmohammadi salmanmohammadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @djsaunde could you take a pass?

class AxolotlGRPOConfig(AxolotlTrainingMixins, GRPOConfig):
"""Axolotl GRPO Config for GRPO training"""

sequence_parallel_degree: int | None = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? AFAICT this isn't used

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have this code in AxolotlGRPOSequenceParallelTrainer:

num_sp_groups = num_processes // self.args.sequence_parallel_degree

Copy link
Copy Markdown
Collaborator

@djsaunde djsaunde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's an easier way to update our custom dataloader logic, seems annoying to have to update whenever upstream changes

@github-actions github-actions Bot temporarily deployed to preview June 4, 2025 21:06 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
src/axolotl/cli/vllm_serve.py (1)

83-141: 💡 Verification agent

🧩 Analysis chain

Add error handling and document maintenance risks.

The monkey patching approach has maintenance risks and the Connection operations lack error handling.

  1. Add error handling for Connection operations:
         # Send ready signal to parent process
-        connection.send({"status": "ready"})
+        try:
+            connection.send({"status": "ready"})
+        except Exception as e:
+            # Log the error or handle appropriately
+            raise RuntimeError(f"Failed to send ready signal: {e}")

         while True:
             # Wait for commands from the parent process
             try:
                 command = connection.recv()
             except KeyboardInterrupt:
                 llm.collective_rpc(method="close_communicator")
                 break
+            except Exception as e:
+                # Log the error or handle appropriately
+                raise RuntimeError(f"Failed to receive command: {e}")
  1. Document the maintenance risk with a comment:
 def patch_vllm_worker():
+    """
+    Monkey patch the vllm worker to add reasoning support.
+    
+    WARNING: This patch modifies the internal implementation of trl.scripts.vllm_serve.llm_worker.
+    It may break if the upstream implementation changes. Consider contributing this feature
+    upstream or finding a more maintainable approach.
+    """
     from multiprocessing.connection import Connection

Let me verify if the upstream llm_worker function signature is compatible:


🏁 Script executed:

#!/bin/bash
# Description: Check the signature of the original llm_worker function in trl
# to ensure our patch is compatible

# Search for the llm_worker function definition in the trl package
rg -A 5 "def llm_worker" --glob "**/*.py" | grep -E "(def llm_worker|script_args|ScriptArguments)"

Length of output: 225


Add error handling and document the maintenance risk of monkey‐patching

The patch_vllm_worker helper replaces trl.scripts.vllm_serve.llm_worker in place, which may break when the upstream implementation changes. It also doesn’t guard against failures in the Connection API. Please:

  1. Annotate the maintenance risk at the top of patch_vllm_worker with a clear docstring warning.
  2. Wrap all connection.send and connection.recv calls in try/except blocks and rethrow or log a clear error.

Suggested diff in src/axolotl/cli/vllm_serve.py:

 def patch_vllm_worker():
+    """
+    Monkey-patch the vLLM worker to add reasoning support.
+
+    WARNING: This overrides `trl.scripts.vllm_serve.llm_worker` internally.
+    Upstream changes to that implementation may silently break this patch.
+    Consider contributing this feature upstream or adopting a more robust extension mechanism.
+    """
     from multiprocessing.connection import Connection

@@
-        # Send ready signal to parent process
-        connection.send({"status": "ready"})
+        # Send ready signal to parent process
+        try:
+            connection.send({"status": "ready"})
+        except Exception as e:
+            raise RuntimeError(f"[vllm_serve] failed to send ready signal: {e}")

         while True:
             # Wait for commands from the parent process
-            try:
-                command = connection.recv()
-            except KeyboardInterrupt:
-                llm.collective_rpc(method="close_communicator")
-                break
+            try:
+                command = connection.recv()
+            except KeyboardInterrupt:
+                llm.collective_rpc(method="close_communicator")
+                break
+            except Exception as e:
+                raise RuntimeError(f"[vllm_serve] failed to receive command: {e}")

             # Handle commands
             if command["type"] in ["call", "fire_and_forget"]:
🧹 Nitpick comments (1)
src/axolotl/cli/vllm_serve.py (1)

23-23: Consider Python version compatibility for type hints.

The type hint bool | None uses Python 3.10+ union syntax. If the project needs to support older Python versions, consider using Optional[bool] instead.

For broader Python version compatibility:

-    enable_reasoning: bool | None = field(default=None, kw_only=True)
+    enable_reasoning: Optional[bool] = field(default=None, kw_only=True)

Also add the import:

-from typing import Union
+from typing import Optional, Union
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82611e7 and 15f8090.

📒 Files selected for processing (4)
  • requirements.txt (1 hunks)
  • src/axolotl/cli/args.py (1 hunks)
  • src/axolotl/cli/vllm_serve.py (3 hunks)
  • src/axolotl/utils/schemas/vllm.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/axolotl/utils/schemas/vllm.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • requirements.txt
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.6.0)

Comment thread src/axolotl/cli/args.py
Comment on lines +91 to +98
enable_reasoning: Optional[bool] = field(
default=None,
)

reasoning_parser: Optional[str] = field(
default=None,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add metadata with help descriptions for consistency.

The new fields enable_reasoning and reasoning_parser lack metadata with help descriptions, while all other fields in the VllmServeCliArgs dataclass include them. This inconsistency affects CLI discoverability and user experience.

Apply this diff to add descriptive metadata:

 enable_reasoning: Optional[bool] = field(
     default=None,
+    metadata={
+        "help": "Enable reasoning mode for VLLM generation. When enabled, the model can perform multi-step reasoning."
+    },
 )
 
 reasoning_parser: Optional[str] = field(
     default=None,
+    metadata={
+        "help": "Specify the reasoning parser to use for parsing model outputs in reasoning mode."
+    },
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
enable_reasoning: Optional[bool] = field(
default=None,
)
reasoning_parser: Optional[str] = field(
default=None,
)
enable_reasoning: Optional[bool] = field(
default=None,
metadata={
"help": "Enable reasoning mode for VLLM generation. When enabled, the model can perform multi-step reasoning."
},
)
reasoning_parser: Optional[str] = field(
default=None,
metadata={
"help": "Specify the reasoning parser to use for parsing model outputs in reasoning mode."
},
)
🤖 Prompt for AI Agents
In src/axolotl/cli/args.py around lines 91 to 98, the fields enable_reasoning
and reasoning_parser are missing metadata with help descriptions, unlike other
fields in the VllmServeCliArgs dataclass. Add metadata to both fields with
appropriate help strings describing their purpose to maintain consistency and
improve CLI usability.

@winglian winglian merged commit c67910f into main Jun 5, 2025
28 of 31 checks passed
@winglian winglian deleted the hf-upgrades branch June 5, 2025 14:20
@coderabbitai coderabbitai Bot mentioned this pull request Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants