Skip to content

make always skip_move_to_device default as true#3084

Merged
winglian merged 1 commit into
mainfrom
skip-move-to-device-as-default
Aug 26, 2025
Merged

make always skip_move_to_device default as true#3084
winglian merged 1 commit into
mainfrom
skip-move-to-device-as-default

Conversation

@winglian

@winglian winglian commented Aug 20, 2025

Copy link
Copy Markdown
Collaborator

Description

We should default to never moving the model to device until we're ready to train/shard as doing so prematurely will lead to OOMs.

Summary by CodeRabbit

  • Refactor

    • Changed default behavior to skip moving models to device before sharding. This is now enabled by default; set the flag to false to restore legacy behavior.
  • Documentation

    • Updated description to clarify the new default and how to revert to legacy behavior.

@coderabbitai

coderabbitai Bot commented Aug 20, 2025

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Updated ModelInputConfig in src/axolotl/utils/schemas/model.py to change experimental_skip_move_to_device default from None to True and revised its description to indicate the new default and how to revert to legacy behavior. No other logic or error handling was modified.

Changes

Cohort / File(s) Summary of changes
Schema default update
src/axolotl/utils/schemas/model.py
Changed experimental_skip_move_to_device default to True; updated json_schema_extra description to reflect new default and how to revert.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Suggested reviewers

  • djsaunde

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f81674b and cb18daf.

📒 Files selected for processing (1)
  • src/axolotl/utils/schemas/model.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/axolotl/utils/schemas/model.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: docker-e2e-tests-1st (126, 12.6.3, 3.11, 2.7.1, 1)
  • GitHub Check: docker-e2e-tests-1st (126, 12.6.3, 3.11, 2.6.0, 1, Dockerfile-uv.jinja)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch skip-move-to-device-as-default

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions

github-actions Bot commented Aug 20, 2025

Copy link
Copy Markdown
Contributor

📖 Documentation Preview: https://68a524a01e20e0293fe576a5--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit cb18daf

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/gpt-oss/README.md (1)

85-91: Fix typo and improve clarity in SGLang section

Correct “infomation” → “information”. Minor style improvement is optional.

Apply this diff:

-SGLang has 0-day support in main, see https://github.com/sgl-project/sglang/issues/8833 for infomation on installing
+SGLang has 0-day support in main; see https://github.com/sgl-project/sglang/issues/8833 for information on installing
 SGLang from source. Once you've installed SGLang, run the following command to launch a SGLang server:
🧹 Nitpick comments (4)
src/axolotl/cli/inference.py (2)

68-71: Guard against empty/missing datasets to prevent IndexError

If cfg.chat_template is falsy and cfg.datasets is unset or empty, cfg.datasets[0] will raise. Add a defensive check.

Apply this diff:

-    elif cfg.datasets[0].type == "chat_template":
-        chat_template_str = get_chat_template_from_config(
-            cfg=cfg, ds_cfg=cfg.datasets[0], tokenizer=tokenizer
-        )
+    elif getattr(cfg, "datasets", None) and cfg.datasets and getattr(cfg.datasets[0], "type", None) == "chat_template":
+        chat_template_str = get_chat_template_from_config(
+            cfg=cfg, ds_cfg=cfg.datasets[0], tokenizer=tokenizer
+        )

126-134: Avoid double-printing with TextStreamer and pass attention_mask like the Gradio path

TextStreamer already prints tokens to stdout; decoding and printing again duplicates output. Also, pass attention_mask when available for consistency with do_inference_gradio.

Apply this diff:

-            streamer = TextStreamer(tokenizer)
-            generated = model.generate(
-                inputs=batch["input_ids"].to(cfg.device),
-                generation_config=generation_config,
-                streamer=streamer,
-            )
-        print("=" * 40)
-        print(tokenizer.decode(generated["sequences"].cpu().tolist()[0]))
+            streamer = TextStreamer(tokenizer)
+            generation_kwargs = {
+                "inputs": batch["input_ids"].to(cfg.device),
+                "generation_config": generation_config,
+                "streamer": streamer,
+            }
+            if "attention_mask" in batch:
+                generation_kwargs["attention_mask"] = batch["attention_mask"].to(cfg.device)
+            _ = model.generate(**generation_kwargs)
+        print("=" * 40)
examples/gpt-oss/README.md (1)

70-83: Replace bare URL and tighten wording in vLLM section

Use link formatting to satisfy markdownlint and tweak wording/casing.

Apply this diff:

-GPT-OSS support in vLLM does not exist in a stable release yet. See https://x.com/MaziyarPanahi/status/1955741905515323425
-for more information about using a special vllm-openai docker image for inferencing with vLLM.
+GPT-OSS support in vLLM does not exist in a stable release yet. See [this update](https://x.com/MaziyarPanahi/status/1955741905515323425)
+for more information about using a special vLLM OpenAI Docker image for inference with vLLM.
src/axolotl/utils/schemas/model.py (1)

67-72: Consider making this a plain bool (not Optional) since it now has a concrete default

With default=True, a tri-state Optional adds ambiguity and downstream None-handling. If you no longer rely on None, simplify the type.

Apply this diff:

-    experimental_skip_move_to_device: bool | None = Field(
-        default=True,
+    experimental_skip_move_to_device: bool = Field(
+        default=True,
         json_schema_extra={
             "description": "Don't move the model to the device before sharding. Set to `false` to revert to legacy behavior."
         },
     )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 050210e and f81674b.

📒 Files selected for processing (3)
  • examples/gpt-oss/README.md (1 hunks)
  • src/axolotl/cli/inference.py (1 hunks)
  • src/axolotl/utils/schemas/model.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/cli/inference.py (2)
src/axolotl/utils/chat_templates/base.py (1)
  • get_chat_template (26-85)
src/axolotl/utils/mistral/mistral_tokenizer.py (1)
  • chat_template (41-43)
🪛 LanguageTool
examples/gpt-oss/README.md

[grammar] ~72-~72: There might be a mistake here.
Context: ...MaziyarPanahi/status/1955741905515323425 for more information about using a speci...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
examples/gpt-oss/README.md

72-72: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.7.1)
  • GitHub Check: pre-commit
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: pre-commit
  • GitHub Check: preview
🔇 Additional comments (1)
src/axolotl/cli/inference.py (1)

67-67: Passing tokenizer into get_chat_template is correct and necessary

This aligns with the updated signature and avoids ValueErrors in default/fallback paths that require a tokenizer.

Comment on lines 67 to 72
experimental_skip_move_to_device: bool | None = Field(
default=None,
default=True,
json_schema_extra={
"description": "Don't move the model to the device before sharding. "
"This is an experimental feature that may be included in the future as the default."
"description": "Don't move the model to the device before sharding. Set to `false` to revert to legacy behavior."
},
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Default flip to True changes behavior; verify call sites and docs

Changing default to True modifies runtime behavior for users who relied on the implicit legacy default. Ensure:

  • All call sites treat None and True consistently.
  • Docs/changelogs communicate the new default and how to revert (set to false).

Run this script to locate references and any docs mentioning this flag or its default:

I can help generate a short “Migration note” snippet for the docs if you want.


🏁 Script executed:

#!/bin/bash
# Find all code usages
rg -n -C2 --type=py '\bexperimental_skip_move_to_device\b'

# Find mentions in docs and examples (may include README/markdown/yaml)
rg -n -C2 '(experimental_skip_move_to_device|skip_move_to_device)' -g '!**/node_modules/**'

Length of output: 7356


Document and Align Examples with New Default for experimental_skip_move_to_device

The default for experimental_skip_move_to_device has been flipped to True, which now forces skip_move_to_device=True in all cases unless explicitly set to False. This silently changes device-placement behavior for existing users.

Please ensure:

  • Loader logic in
    src/axolotl/utils/schemas/model.py:67–72 (default=True)
    src/axolotl/loaders/model.py:827–830 (override when not None)
    correctly reflects this intent.
  • All example configs in examples/gpt-oss/*.yaml either:
    • Remove experimental_skip_move_to_device: true (since it’s now the default), or
    • Explicitly set it to false where legacy behavior is required.
  • Documentation and changelog include a Migration Note such as:
    experimental_skip_move_to_device now defaults to True. Set it to false to retain legacy behavior (moving the model to the device before sharding).”

Let me know if you’d like a draft of the migration snippet!

🤖 Prompt for AI Agents
In src/axolotl/utils/schemas/model.py around lines 67–72 and
src/axolotl/loaders/model.py around lines 827–830, the default for
experimental_skip_move_to_device has been flipped to True which changes runtime
behavior; update the loader so it treats None as "use default True" and only
applies an override when the value is explicitly False (i.e., ensure logic sets
skip_move_to_device=True by default and only sets it to False if config
explicitly contains False), update all example configs under examples/gpt-oss/
to remove explicit experimental_skip_move_to_device: true or set
experimental_skip_move_to_device: false where legacy behavior is required, and
add a Migration Note to the docs/changelog stating that
experimental_skip_move_to_device now defaults to True and instructing users to
set it to false to retain legacy behavior.

@codecov

codecov Bot commented Aug 20, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@winglian winglian force-pushed the skip-move-to-device-as-default branch from f81674b to cb18daf Compare August 20, 2025 01:22

@salmanmohammadi salmanmohammadi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just remove this config field?

@winglian

Copy link
Copy Markdown
Collaborator Author

Should we just remove this config field?

My thought was to have this in main for a bit with the fallback option of setting this to None/false in case something breaks. We can strip it all out in the next release.

@winglian winglian merged commit e1131e9 into main Aug 26, 2025
30 of 31 checks passed
@winglian winglian deleted the skip-move-to-device-as-default branch August 26, 2025 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants