Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max authors #3

Closed
josuetsm opened this issue Sep 13, 2020 · 3 comments
Closed

Max authors #3

josuetsm opened this issue Sep 13, 2020 · 3 comments

Comments

@josuetsm
Copy link

Hi, I've been using your repository a few weeks ago to estimate social media user ideal points. However, I have noticed that when I try to run the model with more than 800 authors, the model does not converge. Specifically the ELBO returned nan values. Have you ever run the model with more than 800 authors? Also, do you know some article that discusses variational inference convergence problems? I think my problem may be due to the number of parameters but I am not sure.
I would appreciate if you could guide me, Thanks!

@keyonvafa
Copy link
Owner

Hi, hmm. I haven't tried running with 800 authors but that shouldn't be the nan issue (each author is only adding one extra parameter to the model). Out of curiosity, what happens if you keep the dataset the same but change the author indices so that there are only 2 authors (i.e. incorrectly label the authors)? I assume the nans would still be there, but if they're not, that would confirm that the issue is with the number of authors.

Are you using the TensorFlow or PyTorch implementation? And what is the vocabulary size and the number of documents you're using?

@josuetsm
Copy link
Author

Sorry for the lateness of my reply, I had to pause the project for a while. Changing the author indices leaving only 2 authors solved the problem. However, days later I was able to find the root of my problem and it was not the number of authors. The problem was generated because I had authors with 0 vocabulary words and the optimization placed a 0 in the rate parameter of the Poisson distribution, generating Nans in the log_prob.
However, eliminating the authors with 0 words in the vocabulary, I have been able to estimate ideal points for datasets with 100,000 authors.
Thanks for the answer and for the excellent tutorial on Google Colab.

@keyonvafa
Copy link
Owner

Great, I'm glad it's working now. And thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants