Skip to content

IMDB Dataset of 50K Movie translated Urdu Reviews

Compare
Choose a tag to compare
@akkefa akkefa released this 07 May 11:41
· 19 commits to master since this release
1aea16a

Urdu Sentiment Analysis dataset

This is a dataset for binary sentiment classification containing substantially more data than previous
benchmark datasets. We provide a set of 40,000 highly polar movie reviews for training and 10,000 for testing.
To increase the availability of sentiment analysis dataset for a low recourse language like Urdu,
we opted to use the already available IMDB Dataset. we have translated this dataset using google translator.
This is a binary classification dataset having two classes as positive and negative.
The reason behind using this dataset is high polarity for each class.
It contains 50k samples equally divided in two classes.

Homepage https://www.kaggle.com/akkefa/imdb-dataset-of-50k-movie-translated-urdu-reviews