Skip to content

Conversation

@tirthasheshpatel
Copy link
Contributor

@tirthasheshpatel tirthasheshpatel commented Feb 14, 2024

Closes #1418

Adds mistral_7b_en and mistral_instruct_7b_en presets for Mistral. Also added tests for presets for all the components of Mistral.

@mattdangerw
Copy link
Member

Can you show a full generation run with the causal lm class running in a colab? Loaded from kaggle.

@tirthasheshpatel
Copy link
Contributor Author

Can you show a full generation run with the causal lm class running in a colab? Loaded from kaggle.

Here: https://colab.research.google.com/drive/1508c8IY_nQQIsF33nUc_87l0mLo2BFw0?usp=sharing

@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label Feb 14, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Feb 14, 2024
@mattdangerw
Copy link
Member

This looks great! Just made the Kaggle model public so we can re-run ci.

I think there is one small thing to fix here. With the Kaggle rewrite our tokenizers can be created without a vocabulary, which is loaded after the fact. Basically loading will look like MistralTokenizer.from_config(config); MistralTokenizer.load_assets(path). So we have to accommodate a tokenizer not knowing it's vocabulary on creation.

To deal with this fact, we only grab the special tokens from the tokenizer the first time a preprocessing layer built. Not when it is constructed. Basically we want this...

https://github.com/keras-team/keras-nlp/blob/bc3852f9fc3e9edaab7bc6c5d7ee40e6d3618ac2/keras_nlp/models/bert/bert_preprocessor.py#L141-L156

Instead of this...

https://github.com/keras-team/keras-nlp/blob/bc3852f9fc3e9edaab7bc6c5d7ee40e6d3618ac2/keras_nlp/models/mistral/mistral_preprocessor.py#L122-L132

You might be able to trigger a bug if you save a MistralCausalLM in the .keras format, and then load the model back, and check causal_lm.preprocessor.packer.start_value. Not sure but we should probably check.

@tirthasheshpatel
Copy link
Contributor Author

Thanks for noticing this @mattdangerw! Should be fixed now. Let me know if it looks good to you now!

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@mattdangerw mattdangerw merged commit 70c57cc into keras-team:master Feb 15, 2024
@tirthasheshpatel tirthasheshpatel deleted the mistral-presets branch February 16, 2024 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preset and doc for Mistral (multilingual)

3 participants