Code for the safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates" (https://arxiv.org/abs/2402.18540)
Go to the folder gpt-api and see run-gpt-gsm.sh
for an example shell script to fine-tune gpt-3.5-turbo-0613
on GSM8K.
- The code will automatically output the ids of the fine-tuning job and the fine-tuned model, and log them to WandB.
- You can also view the training curves when the training ends on WandB.
- See gpt-api/prompt_utils.py for all prompt templates.
Coming soon!
inference.py
is a variant of Llama's inference code but with multi-gpu support.
python inference.py \
<path-to-model>
--peft_model <path-to-peft> \
--prompt_file vfleaking/DirectHarm4 \
--prompt_template_style gsm:chat:llama \
--output <output-file> \
--top_p 0 --freq 8
prompt_file
: can bevfleaking/DirectHarm4
,https://huggingface.co/datasets/vfleaking/GSM-Danger
ordata/advbench-harmful-behaviors.csv
prompt_template_style
: Seeprompt_utils.py
for possible options.freq
: the batch size
Coming soon! We are basically using Llama's fine-tuning code but we need some time to clean up the code.
gpt4_eval.py
is a multi-thread variant of gpt4_eval.py
from Qi et al. (2023). Please set your OpenAI API key before running the evaluation command:
python safety_evaluation/gpt4_eval.py --input_file question_output/example.jsonl
input_file
: ajsonl
file with each line containing the input prompt and the model response.- The output of the GPT-4 judge will be saved under
safety_evaluation/gpt4_eval_output
.
@article{lyu2024keeping,
title={Keeping {LLMs} Aligned After Fine-tuning: The Crucial Role of Prompt Templates},
author={Kaifeng Lyu and Haoyu Zhao and Xinran Gu and Dingli Yu and Anirudh Goyal and Sanjeev Arora},
journal={arXiv preprint arXiv:2402.18540},
year={2024}
}