Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentiment analysis laser #274

Merged

Conversation

NIXBLACK11
Copy link
Contributor

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 28, 2023
@NIXBLACK11 NIXBLACK11 marked this pull request as draft November 28, 2023 17:00

To run the notebook in Google Colab, simply click the "Open in Colab" button below:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12gQUG7rPJvOVeWQkpMFzMiixqwDIdv4W?usp=sharing)
Copy link
Contributor

@avidale avidale Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you have now two independent notebooks: one in Google Drive (tied to Colab), and another here in Github.

I suggest that instead, we have only one copy of the notebook, the one in Github, and modify the Colab url so that it always loads the version from Github. The url will look like https://colab.research.google.com/github/NIXBLACK11/LASER-fork/blob/Sentiment-analysis-laser/tasks/SentimentAnalysis/SentimentAnalysis.ipynb, only you'll need to update the path in a way that refers the final destination (the main branch of the LASER repository).

(I found this trick here)

"metadata": {},
"outputs": [],
"source": [
"with open('/content/drive/MyDrive/dataset/train.csv', 'rb') as f:\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be using Google Drive here, but there is not code above that mounts it. This is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have to add that.

"metadata": {},
"outputs": [],
"source": [
"with open('/content/drive/MyDrive/dataset/train.csv', 'rb') as f:\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe add above some text about where and how to download this dataset?
Currently, those who open the notebook directly have no idea where to get it.

@NIXBLACK11 NIXBLACK11 marked this pull request as ready for review November 30, 2023 15:11
"source": [
"## Step 3: Download the Dataset\n",
"\n",
"Next, let's acquire a sentiment analysis dataset to train our model. We'll download a dataset from Kaggle and unzip it into a directory named ./dataset. Execute the following commands:\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What dataset are you using? Can you put a short description and a link to the Kaggle page presenting the dataset?
Also, I see that some credentials are included in the URL; did you use your own credentials for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I used my own credentials for this.
I think I can add just steps on how to download the dataset from kaggle.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe that would be better. It's a bit annoying to have to download the dataset, but I suppose your credentials might expire at a certain point and break the notebook. Isn't there another source to download the dataset with no need for credentials?
@avidale @heffernankevin might have ideas about this.

Copy link
Contributor

@heffernankevin heffernankevin Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe use one which we can download from HuggingFace. For this we can use the datasets library:

python -m pip install datasets

An example could be this dataset: https://huggingface.co/datasets/carblacac/twitter-sentiment-analysis

Copy link
Contributor

@heffernankevin heffernankevin Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I see you're reporting "accuracy" at the end for evaluating the trained model. However earlier you show that the labels in the tweet dataset are not balanced. If you move to another dataset and the labels are balanced then you can stick with accuracy. Otherwise ideally we would should precision and recall per label (your confusion matrix sheds light on this).

}
],
"source": [
"# Sentiment Prediction with RNN Neural Network and Confusion Matrix\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to normalize the confusion matrix?

@heffernankevin
Copy link
Contributor

heffernankevin commented Dec 7, 2023

I believe you're training the sentiment model on "eng_Latn". When you then try sentiments on other languages, perhaps just mention in the title that this is technically "zero-shot sentiment prediction" for languages other than English.

eg. "Step 14: Zero-shot Sentiment Prediction for Multilingual Texts"

It's one of the benefits of LASER that such a sentiment model trained only on English should hopefully do well in other languages (even though not explicitly trained on them). You can then also remove your first example in "english"

@heffernankevin heffernankevin merged commit 83c07d3 into facebookresearch:MLH-dev Dec 7, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants