want to fine-tuned llama3.2.1b on MMLU and Arc_challenge and gsm8k(maths) #2132

sorobedio · 2024-12-08T14:26:17Z

Hello, everyone,

I’m new to fine-tuning large language models (LLMs), but I have experience with PyTorch. I’m planning to fine-tune the LLaMA 3.2-1B (base and instruction models) on the MMLU, ARC-Challenge, and GSM8K (math) datasets, using full fine-tuning instead of LoRA. After fine-tuning, I aim to evaluate the models.

Could you please guide me on managing these datasets and share any working examples or resources to get started? Any initial push would be greatly appreciated.

Thank you!

joecummings · 2024-12-09T14:15:48Z

Hey @sorobedio - happy to help in this journey!

Seems like you've got a good idea on what you'd like to accomplish! To get started, I'd recommend the following workflow:

Creating a new working project w/ your IDE of choice
pip installing torchtune (either stable or nightly version)
Copying the Llama3.2-1B example config to your working project w/ tune cp llama3_2/1B_full . or tune cp llama3_2/1B_full_single_device . depending on whether you want to run finetuning in a distributed fashion or on a single GPU

Now that you've got the initial setup, let's talk about what you're actually trying to accomplish. Are you looking to just see the best overall model you can make on your own? Are you wanting to deploy this model for a specific use-case? Are you just curious how the finetuning process works for local models? Your answers to these questions impact the direction you might want to go with finetuning and the data you'll want to use.

You mentioned wanting to start with MMLU, Arc-Challenge, and GSM8K. These are interesting ones b/c they're often used more for evaluation than training. As such, when you look on the Hugging Face Datasets Hub for MMLU, you'll see that the splits provided are for test or validation, not train. You're still welcome to train on the test set, but when you go to evaluate the model, it'll unsurprisingly do very well :) I might suggest taking a look at the subsections of the MMLU benchmark, which include abstract algebra, astronomy, chemistry, etc. and training a model on some interesting data in each section. For example, I found this dataset with lots of questions about chemistry. Let's see how we can train a model on that.

It looks like it takes on a very simple Question/Answer style data structure, so in your config you can specify the dataset like the following:

...
dataset:
  _component_: torchtune.datasets.instruct_dataset
  source: camel-ai/chemistry
  split: train
  column_map:
    input: message_1
    output: message_2
...

This uses a torchtune instruct_dataset that pulls the data directly from the Hugging Face Datasets Hub and maps the input/output columns to the correct ones. See our docs for more information.

Then, training is as easy as:

tune run full_finetune_single_device --config PATH/TO/YOUR_CONFIG.YAML

Now that you've done some training, you also want to evaluate the model! You can point our example eleuther_eval recipe (built on top of the EleutherAI Evaluation Harness) your finetuned model and tell it the tasks you want to run. You can setup an evaluation config like the following:

# Model Arguments
model:
  _component_: torchtune.models.llama3_2.llama3_2_1b

#Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /tmp/Llama-3.2-1B-Instruct/original/tokenizer.model
  max_seq_len: null

# Load in the trained weights
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: PATH/TO/YOUR/FINETUNED/MODEL
  checkpoint_files: [model.safetensors]
  output_dir: ./
  model_type: LLAMA3_2

# Environment
device: cuda
dtype: bf16
seed: 1234 # It is not recommended to change this seed, b/c it matches EleutherAI's default seed
log_level: INFO

# EleutherAI specific eval args
tasks: ["mmlu_val_science"] # Defaulting to science as a good subset
limit: null
batch_size: 1
enable_kv_cache: True
max_seq_length: 8192

# Quantization specific args
quantizer: null

Then launch it with:

tune run eleuther_eval --config PATH/TO/YOUR_EVALUATION_RECIPE.YAML

Follow-ups

I showed you how to run on a single, very simple dataset. For good overall performance, you'll likely want to train on multiple datasets and some of that data may not be in a basic Question/Answer format so you'll need to massage it a bit. Check out our tutorials on this here.
Depending on the hardware you're using to train the model, you might need to take advantage of some techniques to save memory. You can read more about that here

Hope this helped and feel free to reach out with any more questions!

sorobedio changed the title ~~want to fine-tuned llama3.2.1b on MMLU and Arc_challenege and gsm8k(maths)~~ want to fine-tuned llama3.2.1b on MMLU and Arc_challenge and gsm8k(maths) Dec 8, 2024

joecummings self-assigned this Dec 9, 2024

joecummings added question discussion Start a discussion and removed question labels Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

want to fine-tuned llama3.2.1b on MMLU and Arc_challenge and gsm8k(maths) #2132

want to fine-tuned llama3.2.1b on MMLU and Arc_challenge and gsm8k(maths) #2132

sorobedio commented Dec 8, 2024

joecummings commented Dec 9, 2024

want to fine-tuned llama3.2.1b on MMLU and Arc_challenge and gsm8k(maths) #2132

want to fine-tuned llama3.2.1b on MMLU and Arc_challenge and gsm8k(maths) #2132

Comments

sorobedio commented Dec 8, 2024

joecummings commented Dec 9, 2024

Follow-ups