-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BloomModel
hydra support
#129
Conversation
@Dahoas can you give this a quick glance |
Bloom rewards seem a bit low relative to other models :(. But Not much we can really do I guess |
To he expected |
# 1.0 in head_mask indicate we keep the head | ||
# attention_probs has shape batch_size x num_heads x N x N | ||
# head_mask has shape n_layer x batch x num_heads x N x N | ||
head_mask = self.get_head_mask(head_mask, hf_get_num_hidden_layers(self.config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the head mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The head mask is a binary mask that can be used to drop the self-attention weights (softmax(qk)
) from specified heads before computing the full attention output. For example, see here. get_head_mask
just expands the dimensions to line-up with proper shape.
else: | ||
attention_mask = attention_mask.to(hidden_states.device) | ||
|
||
alibi = modeling_bloom.build_alibi_tensor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know bloom used alibi
Looks good to me! I just left some questions so I can better understand how things are functioning. |
This PR adds hydra-based PPO model branching support for
BloomModel
s.wandb
reports:Note: The
<Arch>ModelBranch
implementations should be refactored in the future to share a common interface if we plan on adding more in the near future.