Conversation
|
Awesome PR! 🥳 CC @Cyrilvallez |
48303e6 to
d942994
Compare
|
Rebased for failing CIs |
ArthurZucker
left a comment
There was a problem hiding this comment.
Hey~
As a first review, let's try to use more strength of inheritance
If you look at the PhiAttention a lot of stuff can be skimmed out because it already exists in Gemma for example. Thus you can only add what's required and use del to remove what's to remove (for example dense is o_proj I think)
PhiForCausalLM and all should inherit from Gemma rather than Llama as they can be 1-1 the same this way!
Congrats on the PR 🤗
Cyrilvallez
left a comment
There was a problem hiding this comment.
Thanks for the PR! 🤗 I let some comments to use more inheritance, you could try to go even further if you can!
There was a problem hiding this comment.
This cannot use Llama here as it causes self.classifier to become self.score, which may break weights initialization of pretrained models. You could try to find another model with the same names in the init 🤗
There was a problem hiding this comment.
Found LlamaForTokenClassification closest to PhiForTokenClassification. Although it's copied from MptForTokenClassification, But SubClassing it causes issues for build_phi_alibi_tensor (i.e. build_mpt_alibi_tensor).
But I've addressed the issue by deleting self.score and adding self.classifier (basically renaming) from LlamaForTokenClassification.
@Cyrilvallez Let me know if changes required.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
d942994 to
12b3901
Compare
|
Feel free to ping us again once ready! 🤗 |
@ArthurZucker . Oh, I think that's all. Suggest me any changes if required. |
There was a problem hiding this comment.
All this code should not be needed, a single pass should be enough as this is identical with GemmaForCausalLM
There was a problem hiding this comment.
Cannot possibly remove the forward method completely, as it is required at least for the documentation. Changed to super().forward(.....). @Cyrilvallez
d67d5b8 to
89199ac
Compare
|
Rebased. @Cyrilvallez |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Hey! Sorry for the delay!! Last small comments to cut even more code, then we're good to go! 🤗
There was a problem hiding this comment.
Hi. @Cyrilvallez. I'm concerned. That removing this will result in the lm_head with bias=False as it is in Gemma. And I'm skeptic that it will affect the model initialization Or at least the output. I don't think we can neglect the bias. So kept the lm_head with bias=True in PhiForCausalLM.
Let me know if I'm correct or not.
There was a problem hiding this comment.
Oh yes sorry, did not notice the bias! Then of course we need to keep it. You can still remove the self.model = ... and the post_init() lines though
77d1fd3 to
67ebbaf
Compare
Cyrilvallez
left a comment
There was a problem hiding this comment.
All right, LGTM! Great job thanks!
If #34858 gets merged before this one, you'll just need to make sure to modify accordingly, but it will be extremely straightforward!
|
🤗 |
…amline initialization
…e past_key_values type
…osition embeddings handling
67ebbaf to
9bf4eae
Compare
Done! @Cyrilvallez |
ArthurZucker
left a comment
There was a problem hiding this comment.
Nice! Do you want to include the recent changes in #35235 ? 🤗
|
|
||
| class PhiForCausalLM(PhiPreTrainedModel, GenerationMixin): | ||
| _tied_weights_keys = ["lm_head.weight"] | ||
| _tp_plan = {"lm_head": "colwise_rep"} |
There was a problem hiding this comment.
for TP, the config needs TP as well!
| config.hidden_size // self.num_heads, eps=config.layer_norm_eps, elementwise_affine=True | ||
| ) | ||
|
|
||
| self.rotary_emb = PhiRotaryEmbedding(config=self.config) |
What does this PR do?
Adds Modular Phi #33916
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker @LysandreJik
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.