This notebook demonstrates fine tuning pretrained models from Hugging Face using text classification datasets from the Hugging Face Datasets catalog or a custom dataset. The IMDb Larget Movie Review dataset is used from the Hugging Face Datasets catalog, and the SMS Spam Collection dataset is used as an example of a custom dataset being loaded from a csv file.
The notebook uses Intel® Extension for PyTorch* which extends PyTorch with optimizations for extra performance boost on Intel hardware.
The notebook performs the following steps:
- Import dependencies and setup parameters
- Prepare the dataset
- Prepare the Model for Fine Tuning and Evaluation
- Export the model
- Reload the model and make predictions
- Get Explainations with Intel Explainable AI Tools
To run PyTorch_Text_Classifier_fine_tuning_with_Attributions.ipynb
, install the following dependencies:
- Intel® Explainable AI
pip install intel-transfer-learning-tool==0.6
Dataset Citations
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
url = {http://www.aclweb.org/anthology/P11-1015}
}
@misc{misc_sms_spam_collection_228,
author = {Almeida, Tiago},
title = {{SMS Spam Collection}},
year = {2012},
howpublished = {UCI Machine Learning Repository}
}