make always skip_move_to_device default as true#3084
Conversation
📝 WalkthroughWalkthroughUpdated ModelInputConfig in src/axolotl/utils/schemas/model.py to change experimental_skip_move_to_device default from None to True and revised its description to indicate the new default and how to revert to legacy behavior. No other logic or error handling was modified. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
|
📖 Documentation Preview: https://68a524a01e20e0293fe576a5--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit cb18daf |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/gpt-oss/README.md (1)
85-91: Fix typo and improve clarity in SGLang sectionCorrect “infomation” → “information”. Minor style improvement is optional.
Apply this diff:
-SGLang has 0-day support in main, see https://github.com/sgl-project/sglang/issues/8833 for infomation on installing +SGLang has 0-day support in main; see https://github.com/sgl-project/sglang/issues/8833 for information on installing SGLang from source. Once you've installed SGLang, run the following command to launch a SGLang server:
🧹 Nitpick comments (4)
src/axolotl/cli/inference.py (2)
68-71: Guard against empty/missing datasets to prevent IndexErrorIf cfg.chat_template is falsy and cfg.datasets is unset or empty, cfg.datasets[0] will raise. Add a defensive check.
Apply this diff:
- elif cfg.datasets[0].type == "chat_template": - chat_template_str = get_chat_template_from_config( - cfg=cfg, ds_cfg=cfg.datasets[0], tokenizer=tokenizer - ) + elif getattr(cfg, "datasets", None) and cfg.datasets and getattr(cfg.datasets[0], "type", None) == "chat_template": + chat_template_str = get_chat_template_from_config( + cfg=cfg, ds_cfg=cfg.datasets[0], tokenizer=tokenizer + )
126-134: Avoid double-printing with TextStreamer and pass attention_mask like the Gradio pathTextStreamer already prints tokens to stdout; decoding and printing again duplicates output. Also, pass attention_mask when available for consistency with do_inference_gradio.
Apply this diff:
- streamer = TextStreamer(tokenizer) - generated = model.generate( - inputs=batch["input_ids"].to(cfg.device), - generation_config=generation_config, - streamer=streamer, - ) - print("=" * 40) - print(tokenizer.decode(generated["sequences"].cpu().tolist()[0])) + streamer = TextStreamer(tokenizer) + generation_kwargs = { + "inputs": batch["input_ids"].to(cfg.device), + "generation_config": generation_config, + "streamer": streamer, + } + if "attention_mask" in batch: + generation_kwargs["attention_mask"] = batch["attention_mask"].to(cfg.device) + _ = model.generate(**generation_kwargs) + print("=" * 40)examples/gpt-oss/README.md (1)
70-83: Replace bare URL and tighten wording in vLLM sectionUse link formatting to satisfy markdownlint and tweak wording/casing.
Apply this diff:
-GPT-OSS support in vLLM does not exist in a stable release yet. See https://x.com/MaziyarPanahi/status/1955741905515323425 -for more information about using a special vllm-openai docker image for inferencing with vLLM. +GPT-OSS support in vLLM does not exist in a stable release yet. See [this update](https://x.com/MaziyarPanahi/status/1955741905515323425) +for more information about using a special vLLM OpenAI Docker image for inference with vLLM.src/axolotl/utils/schemas/model.py (1)
67-72: Consider making this a plain bool (not Optional) since it now has a concrete defaultWith default=True, a tri-state Optional adds ambiguity and downstream None-handling. If you no longer rely on None, simplify the type.
Apply this diff:
- experimental_skip_move_to_device: bool | None = Field( - default=True, + experimental_skip_move_to_device: bool = Field( + default=True, json_schema_extra={ "description": "Don't move the model to the device before sharding. Set to `false` to revert to legacy behavior." }, )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
examples/gpt-oss/README.md(1 hunks)src/axolotl/cli/inference.py(1 hunks)src/axolotl/utils/schemas/model.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/cli/inference.py (2)
src/axolotl/utils/chat_templates/base.py (1)
get_chat_template(26-85)src/axolotl/utils/mistral/mistral_tokenizer.py (1)
chat_template(41-43)
🪛 LanguageTool
examples/gpt-oss/README.md
[grammar] ~72-~72: There might be a mistake here.
Context: ...MaziyarPanahi/status/1955741905515323425 for more information about using a speci...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
examples/gpt-oss/README.md
72-72: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: pre-commit
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: pre-commit
- GitHub Check: preview
🔇 Additional comments (1)
src/axolotl/cli/inference.py (1)
67-67: Passing tokenizer into get_chat_template is correct and necessaryThis aligns with the updated signature and avoids ValueErrors in default/fallback paths that require a tokenizer.
| experimental_skip_move_to_device: bool | None = Field( | ||
| default=None, | ||
| default=True, | ||
| json_schema_extra={ | ||
| "description": "Don't move the model to the device before sharding. " | ||
| "This is an experimental feature that may be included in the future as the default." | ||
| "description": "Don't move the model to the device before sharding. Set to `false` to revert to legacy behavior." | ||
| }, | ||
| ) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Default flip to True changes behavior; verify call sites and docs
Changing default to True modifies runtime behavior for users who relied on the implicit legacy default. Ensure:
- All call sites treat None and True consistently.
- Docs/changelogs communicate the new default and how to revert (set to false).
Run this script to locate references and any docs mentioning this flag or its default:
I can help generate a short “Migration note” snippet for the docs if you want.
🏁 Script executed:
#!/bin/bash
# Find all code usages
rg -n -C2 --type=py '\bexperimental_skip_move_to_device\b'
# Find mentions in docs and examples (may include README/markdown/yaml)
rg -n -C2 '(experimental_skip_move_to_device|skip_move_to_device)' -g '!**/node_modules/**'Length of output: 7356
Document and Align Examples with New Default for experimental_skip_move_to_device
The default for experimental_skip_move_to_device has been flipped to True, which now forces skip_move_to_device=True in all cases unless explicitly set to False. This silently changes device-placement behavior for existing users.
Please ensure:
- Loader logic in
•src/axolotl/utils/schemas/model.py:67–72(default=True)
•src/axolotl/loaders/model.py:827–830(override when not None)
correctly reflects this intent. - All example configs in
examples/gpt-oss/*.yamleither:- Remove
experimental_skip_move_to_device: true(since it’s now the default), or - Explicitly set it to
falsewhere legacy behavior is required.
- Remove
- Documentation and changelog include a Migration Note such as:
“experimental_skip_move_to_devicenow defaults toTrue. Set it tofalseto retain legacy behavior (moving the model to the device before sharding).”
Let me know if you’d like a draft of the migration snippet!
🤖 Prompt for AI Agents
In src/axolotl/utils/schemas/model.py around lines 67–72 and
src/axolotl/loaders/model.py around lines 827–830, the default for
experimental_skip_move_to_device has been flipped to True which changes runtime
behavior; update the loader so it treats None as "use default True" and only
applies an override when the value is explicitly False (i.e., ensure logic sets
skip_move_to_device=True by default and only sets it to False if config
explicitly contains False), update all example configs under examples/gpt-oss/
to remove explicit experimental_skip_move_to_device: true or set
experimental_skip_move_to_device: false where legacy behavior is required, and
add a Migration Note to the docs/changelog stating that
experimental_skip_move_to_device now defaults to True and instructing users to
set it to false to retain legacy behavior.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
f81674b to
cb18daf
Compare
salmanmohammadi
left a comment
There was a problem hiding this comment.
Should we just remove this config field?
My thought was to have this in main for a bit with the fallback option of setting this to None/false in case something breaks. We can strip it all out in the next release. |
Description
We should default to never moving the model to device until we're ready to train/shard as doing so prematurely will lead to OOMs.
Summary by CodeRabbit
Refactor
Documentation