Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting of β #32

Open
tangbohu opened this issue Aug 9, 2018 · 1 comment
Open

Setting of β #32

tangbohu opened this issue Aug 9, 2018 · 1 comment

Comments

@tangbohu
Copy link

tangbohu commented Aug 9, 2018

Hi.

In the paper, the authors said "As for parameter β in eq. 2, it usually varies about 0.1, as we
set it to 10^3 divided by number of elements in attention map and batch size for each layer. "

But I am still confused. What is 10^3 mean, and how 0.1 was got?

@d-li14
Copy link

d-li14 commented Aug 25, 2018

@tangbohu I assume that β is 10^3 / batch_size / (feature_map_size)^2, this division occurs in the average function here in practice, batch size is set to 128 by default, and feature map size varies in the range of 32x32, 16x16, 8x8, so the aformentioned equation varies about 0.1. Just my own conjecture from the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants