From ae21fad5270b5b6e15ff22844d4d49d1deaf7cd6 Mon Sep 17 00:00:00 2001
From: NanoCode012 <nano@axolotl.ai>
Date: Thu, 7 Aug 2025 11:56:24 +0700
Subject: [PATCH 1/5] feat(doc): update gpt-oss readme

---
 examples/gpt-oss/README.md | 59 +++++++++++++++++++++++++++++++++-----
 1 file changed, 52 insertions(+), 7 deletions(-)

diff --git a/examples/gpt-oss/README.md b/examples/gpt-oss/README.md
index 7157806afe..2c1b3383ab 100644
--- a/examples/gpt-oss/README.md
+++ b/examples/gpt-oss/README.md
@@ -1,9 +1,54 @@
-# OpenAI's GPT-OSS
+# Finetune OpenAI's GPT-OSS with Axolotl
 
-GPT-OSS is a 20 billion parameter MoE model trained by OpenAI, released in August 2025.
+[GPT-OSS](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: a 20B and 120B.
 
-- 20B Full Parameter SFT can be trained on 8x48GB GPUs (peak reserved memory @ ~36GiB/GPU) - [YAML](./gpt-oss-20b-fft-fsdp2.yaml)
-- 20B LoRA SFT (all linear layers, and experts in last two layers) can be trained a single GPU (peak reserved memory @ ~47GiB)
-  - removing the experts from `lora_target_parameters` will allow the model to fit around ~44GiB of VRAM
-  - [YAML](./gpt-oss-20b-sft-lora-singlegpu.yaml)
-- 20B Full Parameter SFT with FSDP2 offloading can be trained on 2x24GB GPUs (peak reserved memory @ ~21GiB/GPU) - [YAML](./gpt-oss-20b-fft-fsdp2-offload.yaml)
+This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.
+
+## Getting started
+
+1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as GPT-OSS is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html).
+
+    Here is an example of how to install from main for pip:
+
+```bash
+# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
+git clone https://github.com/axolotl-ai-cloud/axolotl.git
+cd axolotl
+
+pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
+pip3 install --no-build-isolation -e '.[flash-attn]'
+```
+
+2. Choose one of the following configs below for training the 20B model.
+
+```bash
+# LoRA SFT linear layers & 2 experts (1x48GB)
+# (only linear layers -> ~44GiB)
+axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml
+
+# FFT SFT with offloading (2x24GB, ~21GiB/GPU)
+axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml
+
+# FFT SFT (8x48gb, ~36GiB/GPU)
+axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml
+```
+
+Note: 120B coming soon!
+
+### TIPS
+
+- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
+- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).
+
+## Optimization Guides
+
+- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
+- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
+
+## Related Resources
+
+- [GPT-OSS Blog](https://openai.com/index/introducing-gpt-oss/)
+- [Axolotl Docs](https://docs.axolotl.ai)
+- [Axolotl Website](https://axolotl.ai)
+- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
+- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)

From 8aaca28931c94bb88045948423ebc69ae64a37ba Mon Sep 17 00:00:00 2001
From: NanoCode012 <nano@axolotl.ai>
Date: Thu, 7 Aug 2025 12:19:24 +0700
Subject: [PATCH 2/5] fix: caps

---
 examples/gpt-oss/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/gpt-oss/README.md b/examples/gpt-oss/README.md
index 2c1b3383ab..0545c3ad55 100644
--- a/examples/gpt-oss/README.md
+++ b/examples/gpt-oss/README.md
@@ -22,14 +22,14 @@ pip3 install --no-build-isolation -e '.[flash-attn]'
 2. Choose one of the following configs below for training the 20B model.
 
 ```bash
-# LoRA SFT linear layers & 2 experts (1x48GB)
+# LoRA SFT linear layers & 2 experts (1x48GB, ~47GiB)
 # (only linear layers -> ~44GiB)
 axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml
 
 # FFT SFT with offloading (2x24GB, ~21GiB/GPU)
 axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml
 
-# FFT SFT (8x48gb, ~36GiB/GPU)
+# FFT SFT (8x48GB, ~36GiB/GPU)
 axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml
 ```
 

From ddb8b9e9f3820e1fa93db501b42b4a7f697a30bf Mon Sep 17 00:00:00 2001
From: NanoCode012 <nano@axolotl.ai>
Date: Thu, 7 Aug 2025 12:45:29 +0700
Subject: [PATCH 3/5] feat: add toolcalling section

---
 examples/gpt-oss/README.md | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/examples/gpt-oss/README.md b/examples/gpt-oss/README.md
index 0545c3ad55..7a63fbd223 100644
--- a/examples/gpt-oss/README.md
+++ b/examples/gpt-oss/README.md
@@ -1,6 +1,6 @@
 # Finetune OpenAI's GPT-OSS with Axolotl
 
-[GPT-OSS](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: a 20B and 120B.
+[GPT-OSS](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: 20B and 120B.
 
 This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.
 
@@ -35,6 +35,21 @@ axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml
 
 Note: 120B coming soon!
 
+### Tool use
+
+GPT-OSS has a comprehensive tool understanding. Axolotl supports tool calling datasets for Supervised Fine-tuning.
+
+Here is an example dataset config:
+```yaml
+datasets:
+  - path: Nanobit/text-tools-2k-test
+    type: chat_template
+```
+
+See [Nanobit/text-tools-2k-test](https://huggingface.co/datasets/Nanobit/text-tools-2k-test) for the sample dataset.
+
+Refer to [our docs](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#using-tool-use) for more info.
+
 ### TIPS
 
 - Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).

From 5211ba11ae1f66f320b2b3bf8791b7ebdc562445 Mon Sep 17 00:00:00 2001
From: NanoCode012 <nano@axolotl.ai>
Date: Thu, 7 Aug 2025 12:45:48 +0700
Subject: [PATCH 4/5] feat: add example tool dataset to docs

---
 docs/dataset-formats/conversation.qmd | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/dataset-formats/conversation.qmd b/docs/dataset-formats/conversation.qmd
index 733b9f32c6..d53c685983 100644
--- a/docs/dataset-formats/conversation.qmd
+++ b/docs/dataset-formats/conversation.qmd
@@ -212,10 +212,11 @@ Instead of passing `tools` via the system prompt, an alternative method would be
 Tools need to follow [JSON schema](https://json-schema.org/learn/getting-started-step-by-step).
 :::
 
+Example config for Llama4:
 ```yaml
 chat_template: llama4
 datasets:
-  - path: ...
+  - path: Nanobit/text-tools-2k-test
     type: chat_template
     # field_tools: tools # default is `tools`
 ```

From e425488e7fbccbe21714bcc4a960ee658d606cb8 Mon Sep 17 00:00:00 2001
From: NanoCode012 <nano@axolotl.ai>
Date: Thu, 7 Aug 2025 18:15:07 +0700
Subject: [PATCH 5/5] chore: update

---
 examples/gpt-oss/README.md | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/examples/gpt-oss/README.md b/examples/gpt-oss/README.md
index 7a63fbd223..9ecd2c859d 100644
--- a/examples/gpt-oss/README.md
+++ b/examples/gpt-oss/README.md
@@ -22,18 +22,20 @@ pip3 install --no-build-isolation -e '.[flash-attn]'
 2. Choose one of the following configs below for training the 20B model.
 
 ```bash
-# LoRA SFT linear layers & 2 experts (1x48GB, ~47GiB)
-# (only linear layers -> ~44GiB)
+# LoRA SFT linear layers & 2 experts (1x48GB @ ~47GiB)
+# (only linear layers @ ~44GiB)
 axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml
 
-# FFT SFT with offloading (2x24GB, ~21GiB/GPU)
+# FFT SFT with offloading (2x24GB @ ~21GiB/GPU)
 axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml
 
-# FFT SFT (8x48GB, ~36GiB/GPU)
+# FFT SFT (8x48GB @ ~36GiB/GPU or 4x80GB @ ~46GiB/GPU)
 axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml
 ```
 
-Note: 120B coming soon!
+Notes:
+- 120B coming soon!
+- Memory usage taken from `device_mem_reserved(gib)` from logs.
 
 ### Tool use