From ae21fad5270b5b6e15ff22844d4d49d1deaf7cd6 Mon Sep 17 00:00:00 2001 From: NanoCode012 Date: Thu, 7 Aug 2025 11:56:24 +0700 Subject: [PATCH 1/5] feat(doc): update gpt-oss readme --- examples/gpt-oss/README.md | 59 +++++++++++++++++++++++++++++++++----- 1 file changed, 52 insertions(+), 7 deletions(-) diff --git a/examples/gpt-oss/README.md b/examples/gpt-oss/README.md index 7157806afe..2c1b3383ab 100644 --- a/examples/gpt-oss/README.md +++ b/examples/gpt-oss/README.md @@ -1,9 +1,54 @@ -# OpenAI's GPT-OSS +# Finetune OpenAI's GPT-OSS with Axolotl -GPT-OSS is a 20 billion parameter MoE model trained by OpenAI, released in August 2025. +[GPT-OSS](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: a 20B and 120B. -- 20B Full Parameter SFT can be trained on 8x48GB GPUs (peak reserved memory @ ~36GiB/GPU) - [YAML](./gpt-oss-20b-fft-fsdp2.yaml) -- 20B LoRA SFT (all linear layers, and experts in last two layers) can be trained a single GPU (peak reserved memory @ ~47GiB) - - removing the experts from `lora_target_parameters` will allow the model to fit around ~44GiB of VRAM - - [YAML](./gpt-oss-20b-sft-lora-singlegpu.yaml) -- 20B Full Parameter SFT with FSDP2 offloading can be trained on 2x24GB GPUs (peak reserved memory @ ~21GiB/GPU) - [YAML](./gpt-oss-20b-fft-fsdp2-offload.yaml) +This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking. + +## Getting started + +1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as GPT-OSS is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html). + + Here is an example of how to install from main for pip: + +```bash +# Ensure you have Pytorch installed (Pytorch 2.6.0 min) +git clone https://github.com/axolotl-ai-cloud/axolotl.git +cd axolotl + +pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja +pip3 install --no-build-isolation -e '.[flash-attn]' +``` + +2. Choose one of the following configs below for training the 20B model. + +```bash +# LoRA SFT linear layers & 2 experts (1x48GB) +# (only linear layers -> ~44GiB) +axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml + +# FFT SFT with offloading (2x24GB, ~21GiB/GPU) +axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml + +# FFT SFT (8x48gb, ~36GiB/GPU) +axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml +``` + +Note: 120B coming soon! + +### TIPS + +- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html). +- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template). + +## Optimization Guides + +- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html) +- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html) + +## Related Resources + +- [GPT-OSS Blog](https://openai.com/index/introducing-gpt-oss/) +- [Axolotl Docs](https://docs.axolotl.ai) +- [Axolotl Website](https://axolotl.ai) +- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl) +- [Axolotl Discord](https://discord.gg/7m9sfhzaf3) From 8aaca28931c94bb88045948423ebc69ae64a37ba Mon Sep 17 00:00:00 2001 From: NanoCode012 Date: Thu, 7 Aug 2025 12:19:24 +0700 Subject: [PATCH 2/5] fix: caps --- examples/gpt-oss/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/gpt-oss/README.md b/examples/gpt-oss/README.md index 2c1b3383ab..0545c3ad55 100644 --- a/examples/gpt-oss/README.md +++ b/examples/gpt-oss/README.md @@ -22,14 +22,14 @@ pip3 install --no-build-isolation -e '.[flash-attn]' 2. Choose one of the following configs below for training the 20B model. ```bash -# LoRA SFT linear layers & 2 experts (1x48GB) +# LoRA SFT linear layers & 2 experts (1x48GB, ~47GiB) # (only linear layers -> ~44GiB) axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml # FFT SFT with offloading (2x24GB, ~21GiB/GPU) axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml -# FFT SFT (8x48gb, ~36GiB/GPU) +# FFT SFT (8x48GB, ~36GiB/GPU) axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml ``` From ddb8b9e9f3820e1fa93db501b42b4a7f697a30bf Mon Sep 17 00:00:00 2001 From: NanoCode012 Date: Thu, 7 Aug 2025 12:45:29 +0700 Subject: [PATCH 3/5] feat: add toolcalling section --- examples/gpt-oss/README.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/examples/gpt-oss/README.md b/examples/gpt-oss/README.md index 0545c3ad55..7a63fbd223 100644 --- a/examples/gpt-oss/README.md +++ b/examples/gpt-oss/README.md @@ -1,6 +1,6 @@ # Finetune OpenAI's GPT-OSS with Axolotl -[GPT-OSS](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: a 20B and 120B. +[GPT-OSS](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: 20B and 120B. This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking. @@ -35,6 +35,21 @@ axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml Note: 120B coming soon! +### Tool use + +GPT-OSS has a comprehensive tool understanding. Axolotl supports tool calling datasets for Supervised Fine-tuning. + +Here is an example dataset config: +```yaml +datasets: + - path: Nanobit/text-tools-2k-test + type: chat_template +``` + +See [Nanobit/text-tools-2k-test](https://huggingface.co/datasets/Nanobit/text-tools-2k-test) for the sample dataset. + +Refer to [our docs](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#using-tool-use) for more info. + ### TIPS - Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html). From 5211ba11ae1f66f320b2b3bf8791b7ebdc562445 Mon Sep 17 00:00:00 2001 From: NanoCode012 Date: Thu, 7 Aug 2025 12:45:48 +0700 Subject: [PATCH 4/5] feat: add example tool dataset to docs --- docs/dataset-formats/conversation.qmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/dataset-formats/conversation.qmd b/docs/dataset-formats/conversation.qmd index 733b9f32c6..d53c685983 100644 --- a/docs/dataset-formats/conversation.qmd +++ b/docs/dataset-formats/conversation.qmd @@ -212,10 +212,11 @@ Instead of passing `tools` via the system prompt, an alternative method would be Tools need to follow [JSON schema](https://json-schema.org/learn/getting-started-step-by-step). ::: +Example config for Llama4: ```yaml chat_template: llama4 datasets: - - path: ... + - path: Nanobit/text-tools-2k-test type: chat_template # field_tools: tools # default is `tools` ``` From e425488e7fbccbe21714bcc4a960ee658d606cb8 Mon Sep 17 00:00:00 2001 From: NanoCode012 Date: Thu, 7 Aug 2025 18:15:07 +0700 Subject: [PATCH 5/5] chore: update --- examples/gpt-oss/README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/examples/gpt-oss/README.md b/examples/gpt-oss/README.md index 7a63fbd223..9ecd2c859d 100644 --- a/examples/gpt-oss/README.md +++ b/examples/gpt-oss/README.md @@ -22,18 +22,20 @@ pip3 install --no-build-isolation -e '.[flash-attn]' 2. Choose one of the following configs below for training the 20B model. ```bash -# LoRA SFT linear layers & 2 experts (1x48GB, ~47GiB) -# (only linear layers -> ~44GiB) +# LoRA SFT linear layers & 2 experts (1x48GB @ ~47GiB) +# (only linear layers @ ~44GiB) axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml -# FFT SFT with offloading (2x24GB, ~21GiB/GPU) +# FFT SFT with offloading (2x24GB @ ~21GiB/GPU) axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml -# FFT SFT (8x48GB, ~36GiB/GPU) +# FFT SFT (8x48GB @ ~36GiB/GPU or 4x80GB @ ~46GiB/GPU) axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml ``` -Note: 120B coming soon! +Notes: +- 120B coming soon! +- Memory usage taken from `device_mem_reserved(gib)` from logs. ### Tool use