This project was made as a part of my individual projects.
Machine translation refers to the automatic translation of a sequence from one language to another. There are many architectures through which the task of translating a sequence can be achieved, ex. Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Sequence to Sequence models, Transformers. Here, in this project, I have mainly used the Sequence to Sequence model with attention. The project aims to translate the source language into the target language. In this project, the source language is taken as English text, and the target language is taken as Hindi text.
The project objective is to understand and implement the sequence to sequence model with attention, a state-of-the-art deep learning architecture.
- Python 3.7.10 or above
- Numpy, Pandas
- Pytorch 1.9.0 or above
- CUDA 10.2 for faster training
- Google Colab
- Seq2Seq with Attention
The dataset can be found in the data directory of this repository. Dataset contains
Hindi_English_Truncated_Corpus.csv.zip
zip file.
The csv file contains two columns:
- English text
- Hindi text
There are total of 39881
rows in the datset. 80%
of the dataset was used for training and the remaining dataset was used for testing purpose.
The model was able to achieve high quality translation.
- Palash Mahendra Kamble - palash04