Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue instantiating a keras_nlp.models.Backbone from a model preset of Hugging Face handles #1574

Closed
ghost opened this issue Apr 12, 2024 · 5 comments
Assignees
Labels

Comments

@ghost
Copy link

ghost commented Apr 12, 2024

Describe the bug

I am unable to instantiate a keras_nlp.models.Backbone from a model preset of Hugging Face handles, and get the following error:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
config.json: 100%
 462/462 [00:00<00:00, 18.6kB/s]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-2-085dc66f3fb4>](https://localhost:8080/#) in <cell line: 1>()
----> 1 backbone = keras_nlp.models.Backbone.from_preset("hf://dmis-lab/biobert-v1.1")

1 frames
[/usr/local/lib/python3.10/dist-packages/keras_nlp/src/utils/preset_utils.py](https://localhost:8080/#) in check_config_class(preset, config_file)
    409     with open(config_path) as config_file:
    410         config = json.load(config_file)
--> 411     return keras.saving.get_registered_object(config["registered_name"])
    412 

KeyError: 'registered_name'

To Reproduce
https://colab.research.google.com/drive/1sL3dEM8ZOLCbQ5RM1P6usrrzfqk5aTbQ?usp=sharing

Expected behavior
This line should probably be changed?
https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/utils/preset_utils.py#L411

E.g., config.json, from HF: https://huggingface.co/google-bert/bert-base-uncased/resolve/main/config.json

Would you like to help us fix it?

@mattdangerw
Copy link
Member

I believe this is because you are attempting to load a huggingface/transformers model as a KerasNLP model. The config and weights are in a fundamentally different format. See #1294 for a bit more of a deep dive here.

I think there's two actionable things here:

  • Long term, we want to offer easy ways to convert from transformers models to keras_nlp models, and vice versa. TBD on how exactly that should work. A set of tools shipped with keras-nlp? Suggestion welcome.
  • More immediately, we might want to make sure we can detect this case and have good error message. Especially because both Transformers and KerasNLP use a config.json. It's confusing!

@Wauplin @SamanehSaadat cc'ing for thoughts.

@Wauplin
Copy link
Contributor

Wauplin commented Apr 15, 2024

More immediately, we might want to make sure we can detect this case and have good error message. Especially because both Transformers and KerasNLP use a config.json. It's confusing!

I see two places in the code where we could add a check. Either when we download from the Hub (check the metadata.json file for instance?) or when we load the model from a local preset. Both are valid IMO. If we add the check when downloading the files from HF, that would be HF-specific and therefore not applied if a user manually download files and try to load them (you'll have such usage for sure!). If we add the check when loading the files, that would not be specific to HF which is somehow better IMO. It would mean checking the consistency of the config.json compared to what's expected. WDYT?

Agree on the longer term goal as well but I'll defer the topic to @Rocketknight1 who's more knowledgeable on the transformers side (and most likely to be discussed in a separate issue?).

@Rocketknight1
Copy link

@mattdangerw I think a short-term check that raises a sensible error makes sense as the first step. Longer-term, conversion should be possible - once we detect a transformers checkpoint in KerasNLP, as long as KerasNLP already has support for that architecture, we could just have a mapping for config attributes, tokenizer vocab and layer names to convert the checkpoint.

In theory, simple. In practice, I'm sure there are lots of painful edge cases to worry about, but I suspect we'd get most of the benefit just from supporting a few of the most popular architectures (e.g. Gemma/Llama/Mistral/Mixtral), and maybe that wouldn't be so bad!

@SamanehSaadat
Copy link
Member

@Wauplin I agree that we should add a check when loading the model to make sure both local and remote presets are covered. I believe, the check should be done at the beginning of our from_preset(). I'll add the check.

@SamanehSaadat
Copy link
Member

The long-term approach has been addressed in #1662 so I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants