We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need to modify Gemma model implementation with:
if self.config.attn_logit_softcapping is not None: attn_weights = attn_weights / self.config.attn_logit_softcapping attn_weights = torch.tanh(attn_weights) attn_weights = attn_weights * self.config.attn_logit_softcapping
if self.config.final_logit_softcapping is not None: logits = logits / self.config.final_logit_softcapping logits = torch.tanh(logits) logits = logits * self.config.final_logit_softcapping
query_pre_attn_scalar
1/sqrt(head_dim)
The text was updated successfully, but these errors were encountered:
Implemented in #490.
Sorry, something went wrong.
No branches or pull requests
Need to modify Gemma model implementation with:
Changelist over original Gemma and status:
query_pre_attn_scalar
instead of1/sqrt(head_dim)
Links
The text was updated successfully, but these errors were encountered: