Skip to content

Conversation

@yweng0828
Copy link
Collaborator

Add set head_size and head_dim when converting the Eagle checkpoint.

This can make the head_size and head_dim of the main model and eagle model decoupled and their value are unnecessarily equal to hidden_size / num_attention_heads (such as mistral-nemo).

@brb-nv brb-nv self-requested a review April 16, 2025 16:02
Copy link
Collaborator

@brb-nv brb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

@yweng0828
Copy link
Collaborator Author

Cherry-pick to #3593, close.

@yweng0828 yweng0828 closed this Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants