make always skip_move_to_device default as true by winglian · Pull Request #3084 · axolotl-ai-cloud/axolotl

winglian · 2025-08-20T01:10:55Z

Description

We should default to never moving the model to device until we're ready to train/shard as doing so prematurely will lead to OOMs.

Summary by CodeRabbit

Refactor
- Changed default behavior to skip moving models to device before sharding. This is now enabled by default; set the flag to false to restore legacy behavior.
Documentation
- Updated description to clarify the new default and how to revert to legacy behavior.

coderabbitai · 2025-08-20T01:11:01Z

📝 Walkthrough

Walkthrough

Updated ModelInputConfig in src/axolotl/utils/schemas/model.py to change experimental_skip_move_to_device default from None to True and revised its description to indicate the new default and how to revert to legacy behavior. No other logic or error handling was modified.

Changes

Cohort / File(s)	Summary of changes
Schema default update `src/axolotl/utils/schemas/model.py`	Changed experimental_skip_move_to_device default to True; updated json_schema_extra description to reflect new default and how to revert.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

use skip_move_to_device for all cases #3015 — Adjusts the same experimental_skip_move_to_device field’s default and description; directly overlaps with this change.

Suggested reviewers

djsaunde

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f81674b and cb18daf.

📒 Files selected for processing (1)

src/axolotl/utils/schemas/model.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/axolotl/utils/schemas/model.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: docker-e2e-tests-1st (126, 12.6.3, 3.11, 2.7.1, 1)
GitHub Check: docker-e2e-tests-1st (126, 12.6.3, 3.11, 2.6.0, 1, Dockerfile-uv.jinja)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch skip-move-to-device-as-default

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2025-08-20T01:16:42Z

📖 Documentation Preview: https://68a524a01e20e0293fe576a5--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit cb18daf

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/gpt-oss/README.md (1)
85-91: Fix typo and improve clarity in SGLang section

Correct “infomation” → “information”. Minor style improvement is optional.

Apply this diff:
-SGLang has 0-day support in main, see https://github.com/sgl-project/sglang/issues/8833 for infomation on installing
+SGLang has 0-day support in main; see https://github.com/sgl-project/sglang/issues/8833 for information on installing
 SGLang from source. Once you've installed SGLang, run the following command to launch a SGLang server:

🧹 Nitpick comments (4)

src/axolotl/cli/inference.py (2)

68-71: Guard against empty/missing datasets to prevent IndexError

If cfg.chat_template is falsy and cfg.datasets is unset or empty, cfg.datasets[0] will raise. Add a defensive check.

Apply this diff:

-    elif cfg.datasets[0].type == "chat_template":
-        chat_template_str = get_chat_template_from_config(
-            cfg=cfg, ds_cfg=cfg.datasets[0], tokenizer=tokenizer
-        )
+    elif getattr(cfg, "datasets", None) and cfg.datasets and getattr(cfg.datasets[0], "type", None) == "chat_template":
+        chat_template_str = get_chat_template_from_config(
+            cfg=cfg, ds_cfg=cfg.datasets[0], tokenizer=tokenizer
+        )

126-134: Avoid double-printing with TextStreamer and pass attention_mask like the Gradio path

TextStreamer already prints tokens to stdout; decoding and printing again duplicates output. Also, pass attention_mask when available for consistency with do_inference_gradio.

Apply this diff:

-            streamer = TextStreamer(tokenizer)
-            generated = model.generate(
-                inputs=batch["input_ids"].to(cfg.device),
-                generation_config=generation_config,
-                streamer=streamer,
-            )
-        print("=" * 40)
-        print(tokenizer.decode(generated["sequences"].cpu().tolist()[0]))
+            streamer = TextStreamer(tokenizer)
+            generation_kwargs = {
+                "inputs": batch["input_ids"].to(cfg.device),
+                "generation_config": generation_config,
+                "streamer": streamer,
+            }
+            if "attention_mask" in batch:
+                generation_kwargs["attention_mask"] = batch["attention_mask"].to(cfg.device)
+            _ = model.generate(**generation_kwargs)
+        print("=" * 40)

examples/gpt-oss/README.md (1)

70-83: Replace bare URL and tighten wording in vLLM section

Use link formatting to satisfy markdownlint and tweak wording/casing.

Apply this diff:

-GPT-OSS support in vLLM does not exist in a stable release yet. See https://x.com/MaziyarPanahi/status/1955741905515323425
-for more information about using a special vllm-openai docker image for inferencing with vLLM.
+GPT-OSS support in vLLM does not exist in a stable release yet. See [this update](https://x.com/MaziyarPanahi/status/1955741905515323425)
+for more information about using a special vLLM OpenAI Docker image for inference with vLLM.

src/axolotl/utils/schemas/model.py (1)

67-72: Consider making this a plain bool (not Optional) since it now has a concrete default

With default=True, a tri-state Optional adds ambiguity and downstream None-handling. If you no longer rely on None, simplify the type.

Apply this diff:
-    experimental_skip_move_to_device: bool | None = Field(
-        default=True,
+    experimental_skip_move_to_device: bool = Field(
+        default=True,
         json_schema_extra={
             "description": "Don't move the model to the device before sharding. Set to `false` to revert to legacy behavior."
         },
     )

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 050210e and f81674b.

📒 Files selected for processing (3)

examples/gpt-oss/README.md (1 hunks)
src/axolotl/cli/inference.py (1 hunks)
src/axolotl/utils/schemas/model.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/axolotl/cli/inference.py (2)

src/axolotl/utils/chat_templates/base.py (1)

get_chat_template (26-85)

src/axolotl/utils/mistral/mistral_tokenizer.py (1)

chat_template (41-43)

🪛 LanguageTool

examples/gpt-oss/README.md

[grammar] ~72-~72: There might be a mistake here.
Context: ...MaziyarPanahi/status/1955741905515323425 for more information about using a speci...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

examples/gpt-oss/README.md

72-72: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: PyTest (3.11, 2.7.0)
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: pre-commit
GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
GitHub Check: pre-commit
GitHub Check: preview

🔇 Additional comments (1)

src/axolotl/cli/inference.py (1)

67-67: Passing tokenizer into get_chat_template is correct and necessary

This aligns with the updated signature and avoids ValueErrors in default/fallback paths that require a tokenizer.

coderabbitai · 2025-08-20T01:17:21Z

    experimental_skip_move_to_device: bool | None = Field(
-        default=None,
+        default=True,
        json_schema_extra={
-            "description": "Don't move the model to the device before sharding. "
-            "This is an experimental feature that may be included in the future as the default."
+            "description": "Don't move the model to the device before sharding. Set to `false` to revert to legacy behavior."
        },
    )


💡 Verification agent

🧩 Analysis chain

Default flip to True changes behavior; verify call sites and docs

Changing default to True modifies runtime behavior for users who relied on the implicit legacy default. Ensure:

All call sites treat None and True consistently.

Docs/changelogs communicate the new default and how to revert (set to false).

Run this script to locate references and any docs mentioning this flag or its default:

I can help generate a short “Migration note” snippet for the docs if you want.

🏁 Script executed:

#!/bin/bash # Find all code usages rg -n -C2 --type=py '\bexperimental_skip_move_to_device\b' # Find mentions in docs and examples (may include README/markdown/yaml) rg -n -C2 '(experimental_skip_move_to_device|skip_move_to_device)' -g '!**/node_modules/**'

Length of output: 7356

Document and Align Examples with New Default for experimental_skip_move_to_device

The default for experimental_skip_move_to_device has been flipped to True, which now forces skip_move_to_device=True in all cases unless explicitly set to False. This silently changes device-placement behavior for existing users.

Please ensure:

Loader logic in
• src/axolotl/utils/schemas/model.py:67–72 (default=True)
• src/axolotl/loaders/model.py:827–830 (override when not None)
correctly reflects this intent.

All example configs in examples/gpt-oss/*.yaml either:

Remove experimental_skip_move_to_device: true (since it’s now the default), or

Explicitly set it to false where legacy behavior is required.

Documentation and changelog include a Migration Note such as:
“experimental_skip_move_to_device now defaults to True. Set it to false to retain legacy behavior (moving the model to the device before sharding).”

Let me know if you’d like a draft of the migration snippet!

🤖 Prompt for AI Agents

In src/axolotl/utils/schemas/model.py around lines 67–72 and src/axolotl/loaders/model.py around lines 827–830, the default for experimental_skip_move_to_device has been flipped to True which changes runtime behavior; update the loader so it treats None as "use default True" and only applies an override when the value is explicitly False (i.e., ensure logic sets skip_move_to_device=True by default and only sets it to False if config explicitly contains False), update all example configs under examples/gpt-oss/ to remove explicit experimental_skip_move_to_device: true or set experimental_skip_move_to_device: false where legacy behavior is required, and add a Migration Note to the docs/changelog stating that experimental_skip_move_to_device now defaults to True and instructing users to set it to false to retain legacy behavior.

codecov · 2025-08-20T01:19:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

salmanmohammadi

Should we just remove this config field?

winglian · 2025-08-20T12:53:45Z

Should we just remove this config field?

My thought was to have this in main for a bit with the fallback option of setting this to None/false in case something breaks. We can strip it all out in the next release.

coderabbitai Bot reviewed Aug 20, 2025

View reviewed changes

make always skip_move_to_device default as true

cb18daf

winglian force-pushed the skip-move-to-device-as-default branch from f81674b to cb18daf Compare August 20, 2025 01:22

salmanmohammadi approved these changes Aug 20, 2025

View reviewed changes

winglian merged commit e1131e9 into main Aug 26, 2025
30 of 31 checks passed

winglian deleted the skip-move-to-device-as-default branch August 26, 2025 13:30

Uh oh!

Conversation

winglian commented Aug 20, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

github-actions Bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

salmanmohammadi left a comment

Choose a reason for hiding this comment

Uh oh!

winglian commented Aug 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

winglian commented Aug 20, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Aug 20, 2025 •

edited

Loading

github-actions Bot commented Aug 20, 2025 •

edited

Loading

codecov Bot commented Aug 20, 2025 •

edited

Loading