Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is B? #1

Open
felixhao28 opened this issue Aug 8, 2018 · 3 comments
Open

What is B? #1

felixhao28 opened this issue Aug 8, 2018 · 3 comments

Comments

@felixhao28
Copy link

felixhao28 commented Aug 8, 2018

I am trying to follow your code but here is where I get lost:

            self.B = input_.data.new(input_.size()).bernoulli_(self.p)
            self.noise = self.U * self.B

What is the purpose of B? To simulate some kind of dropout for the noise? Is it mentioned in the paper somewhere?

Thanks in advance.

source: https://github.com/zhuohan123/g2-lstm/blob/master/language-modeling/g2_lstm.py#L42

@wenhuchen
Copy link

I think his code is totally different from the paper.

@zhuohan123
Copy link
Owner

It is dropout applied to the Gumbel noise. Please check the README for the detail.

@felixhao28
Copy link
Author

Thanks. Somehow I missed that part in readme.

In our experiment, we arbitrarily set p=0.5 but the loss stopped decreasing after a few epochs. Then we completely removed self.B and then the training can continue as normal. In the end, the outputs of the LSTM gates are more skewed towards a Bernoulli distribution (0 and 1) than it did previously, but the end to end accuracy was a just little lower comparing to using plain LSTM. So my conclusion is that G2-LSTM is not a universal drop-in improvement for every task. The idea is very profound though.

Mathematically, does it even make sense to apply such dropout to the Gumbel noise? Randomly subtracting a portion from some of the population will just create two distribution.

And just out of curiosity, have you tried applying the same trick to GRU gates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants