This repository contains the code and workflow for fine-tuning the DistilBERT (distilbert-base-uncased) model from Hugging Face on a sentiment analysis task. The dataset used for training is sourced from Kaggle.
Fine-tuning distilbert-base-uncased for sentiment classification. Data preprocessing and tokenization using Hugging Face Transformers. Model training and evaluation. Inference script to predict sentiment on new text samples.
The 3 datasets used for fine-tuning is available on Kaggle. You can download it using below links:
- IMDB dataset (Sentiment analysis) in CSV format link
- Sentiment Analysis Dataset link
- Stock News Sentiment Analysis(Massive Dataset) link
- final dataset in Huggingface link
The dataset is cleaned, preprocessed, and visualized using Pandas, Matplotlib, and Seaborn. Open and run the notebook:
📜 Notebook: notebooks/data_preprocessing.ipynb
Clone the repository and install the required dependencies:
``` python
git clone https://github.com/KaushiML3/Fine-tuning-a-LLM-for-sentiment-analysis.git
cd your-repo-name
pip install -r requirements.txt
```
The DistilBERT model is fine-tuned using Hugging Face's Transformers library. Training includes learning rate scheduling, and evaluation metrics. Open and run the notebook:
- 📜 Notebook: notebook/Fine tune LLM with LoRA for sentiment analysis.ipynb
2.Alternatively, run the training script:
``` python
python train.py
```
To test the model on new text inputs, run:
``` python
python app.py
```
- sentiment analysis DistilBERT model demo
- Hugging Face for the DistilBERT model.
- Kaggle for the dataset.