Skip to content

shirsh10mall/Instruct-QnA-Fine-Tuning-Google-Flan-T5-Large-LLM-QLoRA-PEFT-Open-Orca-Dataset

Repository files navigation

Fine-Tuning Google Flan T5 Large LLM for Instruction-Based Question Answering (Instruct QA) - LLM QLoRA PEFT - Open Orca Dataset

Kaggle Notebook - Fine Tunning: https://www.kaggle.com/code/shailjakanttiwari/qlora-llm-instruct-fine-tuning-flan-t5-large

Kaggle Notebook - Inference: https://www.kaggle.com/code/shailjakanttiwari/qlora-llm-inference-flan-t5-large/notebook

Introduction

This project focuses on the fine-tuning of the Flan T5 Large language model for the task of question answering, guided explicitly by explicit instructions. By leveraging techniques such as Parameter Efficient Fine-Tuning (PEFT) and Transfer Learning, the project aims to enhance the model's performance on the Open Orca dataset, a diverse collection of questions and answers across various topics and complexities.

Objective

The primary goal of this project is to fine-tune the Flan T5 Large model to effectively answer questions from the Open Orca dataset using specific task instructions. This involves implementing PEFT and Transfer Learning methodologies to improve the model's understanding of questions and context, leading to more accurate responses.

Dataset

The Open Orca dataset serves as the foundation for this project. It provides a wide array of question types, including multiple-choice, reasoning, question-and-answer, one-word answers, translation, grammar correction, and math word problems. Each data instance in the dataset includes fields such as 'id,' 'system_prompt,' 'question,' and 'response.' The dataset's diversity and complexity present an ideal challenge for fine-tuning the Flan T5 Large model.

Data Preparation

Selected the Flan T5 Large model as the basis for fine-tuning, considering its token limit of 512 and encoder-decoder architecture. Input consisted of prompts with questions, and output labels were responses. Instances with input token lengths exceeding 512 were manually modified by adjusting the prompt to avoid truncation. Instances surpassing the token limit were removed from the dataset to ensure effective training.

Instruct Fine-Tuning in LLMs

Instruct Fine-Tuning involves equipping pre-trained models for specific tasks through explicit task instructions. For instance, during question answering, the model is provided with instructions like "Find the answer within the given context." This enables the model to grasp the task and context, resulting in improved performance on the desired task.

Code Implementation

The project's code implementation involved several critical steps, which are detailed below:

  1. Model Initialization: The Flan T5 Large language model was initialized as the foundation for fine-tuning.

  2. PEFT Approach: The PEFT approach was used, signified by the parameter "peft_combine." This method involves using a pre-trained PEFT model and PeftConfig to facilitate fine-tuning. The model was loaded in 8-bit quantized format for memory efficiency.

  3. LoRA-based QLoRA Approach: The LoRA-based QLoRA approach was explored as well. This involved initializing the language model using a T5 checkpoint, enabling gradient checkpointing for memory optimization, and preparing the model for efficient fine-tuning using QLoRA.

  4. Dataloader Setup: The data was prepared for training using a dataloader. Columns such as "system_prompt," "question," and "response" were removed from the dataset, and the tokenized data was formatted for compatibility with PyTorch.

  5. Model Head Training: The model's head weights were set as trainable, enabling them to adapt to the specific task being fine-tuned.

  6. Training Process: The model was fine-tuned for three epochs on a subset of the data. The process involved generating responses using various inference techniques, including "new_tokens," "num_beams," "early_stopping," and more.

Techniques and Model Training

The implementation techniques incorporated the Flan T5 Large model and integrated QLoRA for efficient quantization. This led to significant computational cost reduction and space-efficient storage, enabling multi-epoch training and inference on a single GPU. The model was fine-tuned for three epochs on a subset of the Open Orca dataset, yielding promising results.

Concept of QLoRA and PEFT

Parameter Efficient Fine-Tuning (PEFT) is a technique that leverages pre-trained models to achieve high performance on specific tasks by training only a small fraction of the model's parameters. Quantized Low-Rank Adaptation (QLoRA) is a method for model adaptation using quantization and low-rank approximation, leading to more efficient training and inference.

Results

The model was fine-tuned for three epochs on a subset of the Open Orca dataset, demonstrating notable results on 75,000 data points. The project's success is a testament to the efficacy of the Flan T5 Large model, the implementation of PEFT and QLoRA techniques, and the power of instruction-based fine-tuning. The project's trajectory is set for expansion to incorporate additional data points for further improvement and advancement.

Conclusion

In conclusion, the fine-tuning of the Flan T5 Large language model for instruction-based question answering showcases the potential of leveraging instruction-driven techniques for enhancing model performance on specific tasks. By combining concepts from Parameter Efficient Fine-Tuning and Quantized Low-Rank Adaptation, this project presents a sophisticated approach to improving large language model adaptability and efficiency. The results achieved and methodologies explored in this project provide valuable insights into the realm of fine-tuning and transfer learning for complex language understanding tasks.

Few Inference Results:

image

image

image

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published