Covert a `safetensor` checkpoint from Hugging Face hub #1662

ariG23498 · 2024-06-07T05:43:55Z

Problem Statement

With Keras NLP, you can download models from the Hugging Face Hub using the from_preset() method, provided the model is in a Keras specific format.

The idea to use any model from the Hub needs the following workflow:

The model (backbone) architecture should be documented in both Hugging Face and Keras NLP.
We should be able to build a Keras NPL model from the configuration file of the Hugging Face hub.
We should also be able to port the weights (safetensors) from the Hugging Face Hub into the Keras NLP model.
The tokenizer should be ported as well.

How is this important?

This opens up a lot of opportunities. If we have a model architecture defined in Keras NLP and Hugging Face, we are not tied to any platform. One can use Hugging Face to fine tune the model and upload it to the Hub, where as one can download the fine tuned model and use it as a Keras NLP model with any backends (TensorFlow, PyTorch, JAX)

This PR will make it possible for us to load checkpoints from the Hugging Face Hub into Keras NLP in a format agnostic manner.

How to use

With the current state of the PR one can use the following code to load any Gemma or Llama3 checkpoints as a Keras NLP model.

! pip install -U -q git+https://github.com/ariG23498/keras-nlp@aritra/hf-port

import keras_nlp

causal_lm = keras_nlp.models.GemmaCausalLM.from_preset(
    "hf://google/gemma-7b",
)

causal_lm = keras_nlp.models.Llama3CausalLM.from_preset(
    "hf://meta-llama/Meta-Llama-3-8B",
)

Acknowledgements

Thanks to Matthew Carrigan for his early feedback on the APIs and ideas.

The current PR is based on top of @mattdangerw's work. One can find his code here.

CC: @mattdangerw

mattdangerw

This is awesome! Left some initial comments. The main thing we need is a testing plan.

Are there some small checkpoints we could just load over the network from huggingface? We'd annotate this tests as "large" tests but do this elsewhere.

Other ideas? We definitely want something that will raise alarm bells here if we break the conversion path.

keras_nlp/src/utils/transformers_model_utils/hf_common_port.py

keras_nlp/src/utils/transformers_model_utils/hf_gemma_port.py

keras_nlp/src/utils/transformers_model_utils/__init__.py

keras_nlp/src/models/backbone.py

keras_nlp/src/utils/transformers_model_utils/hf_common_port.py

keras_nlp/src/utils/transformers_model_utils/hf_gemma_port.py

keras_nlp/src/utils/transformers_model_utils/hf_common_port.py

… to match the keras weights

OK

mattdangerw

Looks great! Left some comments, but they are all pretty small nits.

Test are running now. We can ignore Keras 2 failures, they are unrelated. But any other failures that pop up would be worth digging into.

Thanks for this!

keras_nlp/src/utils/preset_utils.py

keras_nlp/src/utils/transformers/convert_gemma.py

keras_nlp/src/utils/transformers/convert_gemma_test.py

keras_nlp/src/utils/transformers/convert_llama3.py

mattdangerw · 2024-06-18T23:09:28Z

keras_nlp/src/utils/transformers/convert_llama3.py

+                ),
+            )
+            port_weight(
+                keras_variable=decoder_layer._self_attention_layer._key_dense.variables[


General comment for all of these, can't you use kernel instead of variables[0]? And bias, etc. The variable names will be a lot more readable.

The only reason I was shying away from using kernel and bias is because, it is not consistent with all the layers.
An embedding layer has embeddings, a normalization layer has scale and a dense layer has kernel. If you want me to make the changes, I would need to run the scripts and be sure about the variable names.

What would you like me to do?

All good to leave as is then! Maybe something we can do as a follow up. Should be pretty easy to do safely now that we have testing in place.

keras_nlp/src/utils/transformers/convert_llama3_test.py

keras_nlp/src/utils/transformers/safetensor_utils.py

ariG23498 · 2024-06-19T01:48:41Z

@mattdangerw the tests that fail is likely due to the safetensors package that is not installed on the machines. How do you want to bypass that?

mattdangerw · 2024-06-19T03:05:50Z

@mattdangerw the tests that fail is likely due to the safetensors package that is not installed on the machines. How do you want to bypass that?

Adding to requirements-common.txt should do it I think?

OK

ariG23498 · 2024-06-24T17:12:52Z

@mattdangerw all the test pass 🥳

SamanehSaadat

Thanks, Aritra! Looks great! Just left some minor comments!

SamanehSaadat · 2024-06-24T18:42:01Z

keras_nlp/src/utils/transformers/convert.py

+        return load_gemma_backbone(cls, preset, load_weights)
+    if cls.__name__ == "Llama3Backbone":
+        return load_llama3_backbone(cls, preset, load_weights)
+    raise ValueError(f"No conversion huggingface/transformers to {cls}")


If a user doesn't know that conversion is required to load a transformers checkpoint in Keras and try to load a transformers checkpoint that doesn't have conversion, they'll end up here, right? Similar to #1574
In that case, I think it'd be nice to have an error message helping the user to know that if conversion is not supported, they can switch to loading a Keras checkpoint if available.

I have modified the Value Error message. Let me know if that was what you wanted.

keras_nlp/src/utils/transformers/convert.py

SamanehSaadat · 2024-06-24T18:55:48Z

keras_nlp/src/utils/transformers/convert_gemma.py

+    Returns:
+        backbone: Initialized Keras model backbone.
+    """
+    transformers_config = load_config(preset, "config.json")


We have a constant for config.json here. We have a plan to change the name of this file in the future so using the constant would make future changes easier.

Here the config.json comes from the Hugging Face Repository. I have added another constant to support this file name, and now am using the constant. Does the current implementation look good?

keras_nlp/src/utils/transformers/convert_gemma.py

keras_nlp/src/utils/transformers/convert_llama3.py

keras_nlp/src/utils/transformers/convert.py

sayakpaul · 2024-06-25T05:35:50Z

The PR title gives me an impression that any checkpoint with the safetensors extension would be supported with this PR.

But from what I understand it is probably not doing that. It is adding support for loading Transformers (the library) formatted checkpoints that have safetensors extension from the Hub provided certain constraints.

If so, it might be nice to modify the title accordingly. I am happy to stand corrected if my understanding is wrong.

This reverts commit c459519.

ariG23498 added 5 commits June 6, 2024 01:40

chore: adding gemma and llama3

72ce619

chore: adding init

8f2fe93

chore: removing hard coded values

a3fdc06

chore: using backbone properties

3235592

chore: reformat

0bb47ad

github-actions bot added the Gemma Gemma model specific issues label Jun 7, 2024

mattdangerw reviewed Jun 7, 2024

View reviewed changes

ariG23498 added 11 commits June 10, 2024 15:24

chore: review changes

4994ca8

chore: removing einops with custom np operations

606fcd7

fix: variable name

3eec438

check: none type for reshape and transpose patterns

225219b

chore: fixing the nesting of reshape and transpose patterns

7f42b2d

fixing nesting of patterns

59aeb70

chore: gemma weight rearrange fix

47c1ea3

chore: adding a hook function to reshape and transpose the hf tensors…

8f90f9b

… to match the keras weights

fix: variable to assign

f016fbb

fix: gemma port

09d8689

Merge branch 'master' into aritra/hf-port

99588a1

OK

mattdangerw added the kokoro:force-run Runs Tests on GPU label Jun 17, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 17, 2024

ariG23498 added 2 commits June 18, 2024 06:38

Merge branch 'master' into aritra/hf-port

767ee2a

OK

chore: adding tests

07183d5

ariG23498 requested a review from mattdangerw June 18, 2024 07:00

mattdangerw added the kokoro:force-run Runs Tests on GPU label Jun 18, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 18, 2024

mattdangerw reviewed Jun 18, 2024

View reviewed changes

mattdangerw changed the title ~~[WIP] loading a safetensor checkpoint from Hugging Face hub~~ Covert a safetensor checkpoint from Hugging Face hub Jun 18, 2024

review comments

cc969bc

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 21, 2024

fix tests

f61a9fa

ariG23498 added the kokoro:force-run Runs Tests on GPU label Jun 21, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 21, 2024

chore: adding guard rails for None types

5a29dc0

ariG23498 added the kokoro:force-run Runs Tests on GPU label Jun 21, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 21, 2024

Merge branch 'master' into aritra/hf-port

85c2586

OK