feat(doc): update gpt-oss readme#3029
Conversation
📝 WalkthroughWalkthroughThe README for the GPT-OSS example was significantly expanded and restructured. It now includes comprehensive instructions for fine-tuning GPT-OSS models with Axolotl, introduces model variants, provides installation and training steps, and offers additional resources. The previous brief content was replaced with detailed, user-oriented documentation. Additionally, a configuration example for the Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
|
📖 Documentation Preview: https://68948c0b1f12ee7c1dad862c--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit e425488 |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
examples/gpt-oss/README.md (3)
1-1: Fix the title spelling (“Fine-tune”).The verb should be hyphenated (“Fine-tune”); the current form (“Finetune”) is non-standard.
-# Finetune OpenAI's GPT-OSS with Axolotl +# Fine-tune OpenAI's GPT-OSS with Axolotl
3-5: Tighten wording and plural agreement.Minor style issues—“open-weight” needs a hyphen, “two variants” doesn’t need an “a”, and the pronoun should be plural.
-[GPT-OSS](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: a 20B and 120B. - -This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking. +[GPT-OSS](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) is a family of open-weight MoE models released by OpenAI in August 2025. There are two variants: 20 B and 120 B. + +This guide shows how to fine-tune them with Axolotl for multi-turn conversations and proper masking.
40-42: Optional: link directly to the “OpenAI messages” spec.For convenience, you could deep-link to the exact subsection (
#chat_format_openai_messages) instead of the broader page.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/gpt-oss/README.md(1 hunks)
🔇 Additional comments (1)
examples/gpt-oss/README.md (1)
9-20: Review comment is incorrect: pinned versions exist on PyPIBoth
torch==2.6.0andsetuptools==75.8.0are valid releases. Confirmed via:
pip3 index versions torchshows version 2.6.0pip3 index versions setuptoolsshows version 75.8.0No changes needed to the pins.
Likely an incorrect or invalid review comment.
| 2. Choose one of the following configs below for training the 20B model. | ||
|
|
||
| ```bash | ||
| # LoRA SFT linear layers & 2 experts (1x48GB) | ||
| # (only linear layers -> ~44GiB) | ||
| axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml | ||
|
|
||
| # FFT SFT with offloading (2x24GB, ~21GiB/GPU) | ||
| axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml | ||
|
|
||
| # FFT SFT (8x48gb, ~36GiB/GPU) | ||
| axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml | ||
| ``` |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Ensure YAML paths exist & clarify GPU requirements.
The commands assume that the three YAML config files live under examples/gpt-oss/. If they are added in this PR, fine; otherwise users will get a “file not found”.
Also consider clarifying memory requirements with upper-case “GB” and consistent spacing.
-# FFT SFT (8x48gb, ~36GiB/GPU)
+# FFT SFT (8 × 48 GB, ~36 GiB/GPU)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 2. Choose one of the following configs below for training the 20B model. | |
| ```bash | |
| # LoRA SFT linear layers & 2 experts (1x48GB) | |
| # (only linear layers -> ~44GiB) | |
| axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml | |
| # FFT SFT with offloading (2x24GB, ~21GiB/GPU) | |
| axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml | |
| # FFT SFT (8x48gb, ~36GiB/GPU) | |
| axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml | |
| ``` | |
| # LoRA SFT linear layers & 2 experts (1x48GB) | |
| # (only linear layers -> ~44GiB) | |
| axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml | |
| # FFT SFT with offloading (2x24GB, ~21GiB/GPU) | |
| axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml | |
| # FFT SFT (8 × 48 GB, ~36 GiB/GPU) | |
| axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml |
🤖 Prompt for AI Agents
In examples/gpt-oss/README.md around lines 22 to 34, clarify that the YAML
config files must exist at the specified paths to avoid file not found errors.
Add a note stating these files should be present or added with the PR. Also,
update the GPU memory requirements to use consistent uppercase "GB" units and
ensure spacing is uniform for readability.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
examples/gpt-oss/README.md (2)
22-34: Clarify that the referenced YAML files must exist in the repo.The three
examples/gpt-oss/*.yamlpaths are assumed to be present, but they are not added in this PR. Readers following the guide will hit “file not found” errors. Add a sentence indicating that these configs live in the repository (or provide links / commit SHAs).
25-33: Normalise memory units for readability.Lines use a mixture of
48GB/24GB(no space) and~47GiB/21GiB(GiB, with space before “/GPU”). For consistency and to avoid confusion between decimal GB and binary GiB, pick one style (e.g. “GB”) and apply uniform spacing:-# LoRA SFT linear layers & 2 experts (1x48GB, ~47GiB) +# LoRA SFT linear layers & 2 experts (1 × 48 GB, ~47 GB)and similarly for the other two lines.
🧹 Nitpick comments (1)
examples/gpt-oss/README.md (1)
1-6: Minor wording & style nits.
- “Finetune” → “Fine-tune” (hyphenated verb form).
- “There are two variants: 20B and 120B.” — consider linking each model card individually for convenience.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/dataset-formats/conversation.qmd(1 hunks)examples/gpt-oss/README.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/dataset-formats/conversation.qmd
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: for axolotl docker commands, the `--ipc=host` flag should be included by default to prevent shared m...
Learnt from: NanoCode012
PR: axolotl-ai-cloud/axolotl#2854
File: README.md:73-77
Timestamp: 2025-07-02T02:56:20.788Z
Learning: For Axolotl Docker commands, the `--ipc=host` flag should be included by default to prevent shared memory failures that commonly occur with PyTorch DataLoaders and multiprocessing during machine learning training workflows.
Applied to files:
examples/gpt-oss/README.md
📚 Learning: when a method has a todo comment indicating it's a temporary override from upstream (like "todo(wing...
Learnt from: winglian
PR: axolotl-ai-cloud/axolotl#3019
File: src/axolotl/core/trainers/base.py:610-669
Timestamp: 2025-08-07T01:12:27.272Z
Learning: When a method has a TODO comment indicating it's a temporary override from upstream (like "TODO(wing): remove once https://github.com/huggingface/transformers/pull/39866/files is merged"), extensive refactoring suggestions may not be worthwhile since the code will be removed once the upstream changes are available.
Applied to files:
examples/gpt-oss/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: preview
| 1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as GPT-OSS is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html). | ||
|
|
||
| Here is an example of how to install from main for pip: |
There was a problem hiding this comment.
Add a concrete Docker command and include --ipc=host.
The text tells users they “can … use our latest Docker images” but never shows a working docker run example. Given past incidents with PyTorch DataLoader shared-memory exhaustion, omit-ting --ipc=host in user docs routinely leads to crashes. Please add an explicit command block such as:
docker run --gpus all --ipc=host -v $PWD:/workspace axolotl/axolotl:latest \
axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yamlThis both demonstrates usage and institutionalises the required flag.
🤖 Prompt for AI Agents
In examples/gpt-oss/README.md around lines 9 to 11, the documentation mentions
using Docker images but does not provide a concrete docker run command including
the necessary --ipc=host flag. Add a code block showing a full docker run
example with --gpus all, --ipc=host, volume mounting the current directory, and
running the axolotl train command with the appropriate config file to
demonstrate correct usage and prevent shared-memory issues.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
examples/gpt-oss/README.md (2)
24-34: Normalize memory-size units & mention YAML location.The comments mix
GBandGiB(1x48GB @ ~47GiB, etc.). Pick one convention (preferablyGBfor HW capacity,GiBfor runtime) and keep spacing consistent (48 GB,36 GiB/GPU).
Also state explicitly that the three referenced YAML files live underexamples/gpt-oss/; without this, new users often get “file not found”.
9-13: Add a concretedocker runexample with--ipc=host(feedback still unresolved).Readers are told they can “use our latest Docker images” but no working command is shown. Past incidents show omitting
--ipc=hostroutinely causes PyTorch shared-memory crashes (see team learning). Please add a ready-to-copy block such as:docker run --gpus all --ipc=host -v $PWD:/workspace axolotl/axolotl:latest \ axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yamlThis both demonstrates usage and institutionalises the required flag.
🧹 Nitpick comments (1)
examples/gpt-oss/README.md (1)
14-14: Spelling nit: use the official “PyTorch” casing.Replace “Pytorch” with “PyTorch” to match the project’s canonical name.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/gpt-oss/README.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: for axolotl docker commands, the `--ipc=host` flag should be included by default to prevent shared m...
Learnt from: NanoCode012
PR: axolotl-ai-cloud/axolotl#2854
File: README.md:73-77
Timestamp: 2025-07-02T02:56:20.788Z
Learning: For Axolotl Docker commands, the `--ipc=host` flag should be included by default to prevent shared memory failures that commonly occur with PyTorch DataLoaders and multiprocessing during machine learning training workflows.
Applied to files:
examples/gpt-oss/README.md
📚 Learning: when a method has a todo comment indicating it's a temporary override from upstream (like "todo(wing...
Learnt from: winglian
PR: axolotl-ai-cloud/axolotl#3019
File: src/axolotl/core/trainers/base.py:610-669
Timestamp: 2025-08-07T01:12:27.272Z
Learning: When a method has a TODO comment indicating it's a temporary override from upstream (like "TODO(wing): remove once https://github.com/huggingface/transformers/pull/39866/files is merged"), extensive refactoring suggestions may not be worthwhile since the code will be removed once the upstream changes are available.
Applied to files:
examples/gpt-oss/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: preview
Description
Update README to be more friendly
Motivation and Context
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit