-
Notifications
You must be signed in to change notification settings - Fork 31.9k
[WIP] Add GeoV model #22403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add GeoV model #22403
Conversation
|
Hey @vpj feel free to ping me for any guidance ! 😉 Also if you need a review tell me |
|
Yeah need a review. Im new to huggingface transformers. Just went by the tutorials. Let me know what else needs to be done in order to merge this. Thanks |
|
Sure ! Reviewing now! |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work ! It's already pretty clean.
Main comment is with naming, and the missing copied from.
Plus I am pretty sure we don't need a new tokenizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change the order of the classes. General functions should be at the beginning. It' a nit but a convention that makes it easier to read the entire file! Let's follow for example what you can see in T5 or any other models!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used what was on gpt-neox. Moved PretrainedModel class down, let me know if it's ok.
| model.to(torch_device) | ||
|
|
||
| inputs = tokenizer("My favorite food is", return_tensors="pt").to(torch_device) | ||
| expected_output = "My favorite food is pizza. I love pizza. I love pizza. I" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| expected_output = "My favorite food is pizza. I love pizza. I love pizza. I" | |
| EXPECTED_OUTPUT = "My favorite food is pizza. I love pizza. I love pizza. I" |
Also do you mind adding a test with the logits of the model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry didnt get you about the test with logits of the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model should be almost entirely the same as Reformer or GPTNeoX , so let's add copied from wherever we can! Also we should rename every GeoV to Geov in name of the classes, it's gonna be more convienent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why rename GeoV to Geov?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure we don't need a new tokenizer for this. We just have to add /n as a special token and it will not be processed by the spm model (consider looking at the reformer, or the big_bird tokenizers which should be usable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously tried adding \n as a special token. But from tokenization_utils.py and tokenization_utils_base.py it looked to me like it assigns len(self) + 1 id to new tokens. But we need to add a special token with a specific id. I went through reformer but couldn't figure out how to do that? Can you help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Gimme a bit of time I'll try to find the best solution to deal with this.
This is why torch_and_flax test is failing |
|
@ArthurZucker Very much appreciate the help so far. Can you please help me get this PR ready by tomorrow since I won't be available for a week after tomorrow. Thank you |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
This error for tests_torch |
|
These two tests seem unrelated to your PR, pull from main and normally they should dissappear |
|
Ok the history of the PR got a little bit messed up 😅 it's alright it can happen from time to time! You can either rebase on main starting from |
|
The styling depends on the version of |
|
Yeah messed up by doing a rebase instead of a merge |
|
I am not sure what the check means by imports order/format, to me it looks quite similar to other files. |
|
Ok this is ruff acting up, I use |
|
What should I do? Do I have to install |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks a lot for all the cleanup!With this I can better see the actual changes. Given this, I am not really sure that we have to go through all the troubles of adding everything to transformers!
The easiest way to share the model is to put it on the hub using this tutorial.
I am sorry if this is more work as you must have created the dev env and etc! But The way the code is now looks good! And it should properly blend in the Custom Model.
Especially for the tokenization (you would have had to add a testing file) you can just do something like
class GeovTokenizer(RoformerTokenizer):
def __init__(self):
super().__init__()
def convert_tokens_to_ids():
def convert_ids_to_tokens():which would be the only things you would have to rewrite! I can also help you as much as I can on adding this to the hub directly!
|
|
||
| return outputs | ||
|
|
||
| @classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it on the attention class but yes, if the whole attention class is wrapped, you don't need to put it everwhere.
|
Oh thanks, didn't know that. I saw this https://huggingface.co/docs/transformers/add_new_model and thought I had to create a pull request. So, just to make sure I'm clear, should I close this PR and share the model according to https://huggingface.co/docs/transformers/custom_models? |
|
Yes! It would be the best 😉 Thanks for your comprehension! |
|
Just out of curiosity, how do you choose which models to add to the repo and what goes to the hub? |
|
Added to the hub, but it doesn't work with pipelines (text-generation). How do I register the model for This is what I'm doing now Trying to load the pipeline with gives the error Thanks |
The more we grow, the more we are trying to add models to the hub! Especially if the model does not have a lot of changes compared to a model that we already support! For the issue, I think you have to update the mapping |
|
How can I change |
|
The same way you did for the config.json:
...
"auto_map": {
"AutoConfig": "configuration_glm.GLMConfig",
"AutoModel": "modeling_glm.GLMModel",
"AutoModelForSeq2SeqLM": "modeling_glm.GLMForConditionalGeneration",
"AutoModelForMultipleChoice": "modeling_glm.GLMForMultipleChoice",
"AutoModelForSequenceClassification": "modeling_glm.GLMForSequenceClassification"
},
... |
|
So my bad, you just need to add |
|
This is a causal lm, is it ok to add it to masked lm? |
|
Ah sorry for Causal LM it should be |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
This PR adds the 9B parameter GeoV language model trained by Georges Harik.