Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May I ask you a question about the "scale_emb_or_prj" parameter? #210

Open
aitch25 opened this issue Jun 12, 2023 · 0 comments
Open

May I ask you a question about the "scale_emb_or_prj" parameter? #210

aitch25 opened this issue Jun 12, 2023 · 0 comments

Comments

@aitch25
Copy link

aitch25 commented Jun 12, 2023

May I ask you a question about the "scale_emb_or_prj" parameter?

As far as I know, the authors of "Attention is all you need" used both 'prj' and 'emb' together in their model. But, you made your code to select one between the two ('prj' or 'emb'), or 'none.' I want to ask why you made this difference from the original paper.

I believe you disigned it because of some reasons, such as performance or something.

Or, if I misunderstood the original paper, please kindly let me know.

I always thank you for your wonderful project.

=== in "transformer.Models.py" ============================================

    self.src_pad_idx, self.trg_pad_idx = src_pad_idx, trg_pad_idx

    # In section 3.4 of paper "Attention Is All You Need", there is such detail:
    # "In our model, we share the same weight matrix between the two
    # embedding layers and the pre-softmax linear transformation...
    # In the embedding layers, we multiply those weights by \sqrt{d_model}".
    #
    # Options here:
    #   'emb': multiply \sqrt{d_model} to embedding output
    #   'prj': multiply (\sqrt{d_model} ^ -1) to linear projection output
    #   'none': no multiplication

    assert scale_emb_or_prj in ['emb', 'prj', 'none']
    scale_emb = (scale_emb_or_prj == 'emb') if trg_emb_prj_weight_sharing else False
    self.scale_prj = (scale_emb_or_prj == 'prj') if trg_emb_prj_weight_sharing else False
    self.d_model = d_model

=================================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant