Skip to content

Commit

Permalink
Add support for OLMo2 (#1076)
Browse files Browse the repository at this point in the history
  • Loading branch information
xenova authored Dec 7, 2024
1 parent 6f27a10 commit c850083
Show file tree
Hide file tree
Showing 5 changed files with 66 additions and 2 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,8 @@ You can refine your search by selecting the task you're interested in (e.g., [te
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (from Meta AI) released with the paper [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic.
1. **[OLMo](https://huggingface.co/docs/transformers/master/model_doc/olmo)** (from AI2) released with the paper [OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838) by Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi.
1. **[OLMo](https://huggingface.co/docs/transformers/master/model_doc/olmo)** (from Ai2) released with the paper [OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838) by Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi.
1. **[OLMo2](https://huggingface.co/docs/transformers/master/model_doc/olmo2)** (from Ai2) released with the blog [OLMo 2: The best fully open language model to date](https://allenai.org/blog/olmo2) by the Ai2 OLMo team.
1. **OpenELM** (from Apple) released with the paper [OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework](https://arxiv.org/abs/2404.14619) by Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
Expand Down
3 changes: 2 additions & 1 deletion docs/snippets/6_supported-models.snippet
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,8 @@
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (from Meta AI) released with the paper [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic.
1. **[OLMo](https://huggingface.co/docs/transformers/master/model_doc/olmo)** (from AI2) released with the paper [OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838) by Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi.
1. **[OLMo](https://huggingface.co/docs/transformers/master/model_doc/olmo)** (from Ai2) released with the paper [OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838) by Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi.
1. **[OLMo2](https://huggingface.co/docs/transformers/master/model_doc/olmo2)** (from Ai2) released with the blog [OLMo 2: The best fully open language model to date](https://allenai.org/blog/olmo2) by the Ai2 OLMo team.
1. **OpenELM** (from Apple) released with the paper [OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework](https://arxiv.org/abs/2404.14619) by Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
Expand Down
1 change: 1 addition & 0 deletions src/configs.js
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ function getNormalizedConfig(config) {
break;
case 'llama':
case 'olmo':
case 'olmo2':
case 'mobilellm':
case 'granite':
case 'cohere':
Expand Down
9 changes: 9 additions & 0 deletions src/models.js
Original file line number Diff line number Diff line change
Expand Up @@ -4112,6 +4112,13 @@ export class OlmoModel extends OlmoPreTrainedModel { }
export class OlmoForCausalLM extends OlmoPreTrainedModel { }
//////////////////////////////////////////////////

//////////////////////////////////////////////////
// OLMo2 models
export class Olmo2PreTrainedModel extends PreTrainedModel { }
export class Olmo2Model extends Olmo2PreTrainedModel { }
export class Olmo2ForCausalLM extends Olmo2PreTrainedModel { }
//////////////////////////////////////////////////


//////////////////////////////////////////////////
// Granite models
Expand Down Expand Up @@ -6877,6 +6884,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
['codegen', ['CodeGenModel', CodeGenModel]],
['llama', ['LlamaModel', LlamaModel]],
['olmo', ['OlmoModel', OlmoModel]],
['olmo2', ['Olmo2Model', Olmo2Model]],
['mobilellm', ['MobileLLMModel', MobileLLMModel]],
['granite', ['GraniteModel', GraniteModel]],
['cohere', ['CohereModel', CohereModel]],
Expand Down Expand Up @@ -6968,6 +6976,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
['codegen', ['CodeGenForCausalLM', CodeGenForCausalLM]],
['llama', ['LlamaForCausalLM', LlamaForCausalLM]],
['olmo', ['OlmoForCausalLM', OlmoForCausalLM]],
['olmo2', ['Olmo2ForCausalLM', Olmo2ForCausalLM]],
['mobilellm', ['MobileLLMForCausalLM', MobileLLMForCausalLM]],
['granite', ['GraniteForCausalLM', GraniteForCausalLM]],
['cohere', ['CohereForCausalLM', CohereForCausalLM]],
Expand Down
52 changes: 52 additions & 0 deletions tests/tiny_random.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import {
// Models
LlamaForCausalLM,
OlmoForCausalLM,
Olmo2ForCausalLM,
GraniteForCausalLM,
CohereModel,
CohereForCausalLM,
Expand Down Expand Up @@ -1369,6 +1370,57 @@ describe("Tiny random models", () => {
});
});

describe("olmo2", () => {
describe("Olmo2ForCausalLM", () => {
const model_id = "hf-internal-testing/tiny-random-Olmo2ForCausalLM";
/** @type {Olmo2ForCausalLM} */
let model;
/** @type {GPT2Tokenizer} */
let tokenizer;
beforeAll(async () => {
model = await Olmo2ForCausalLM.from_pretrained(model_id, {
// TODO move to config
...DEFAULT_MODEL_OPTIONS,
});
tokenizer = await GPT2Tokenizer.from_pretrained(model_id);
tokenizer.padding_side = "left";
}, MAX_MODEL_LOAD_TIME);

it(
"batch_size=1",
async () => {
const inputs = tokenizer("hello");
const outputs = await model.generate({
...inputs,
max_length: 10,
});
expect(outputs.tolist()).toEqual([[15339n, 50957n, 43410n, 77030n, 91444n, 99516n, 80720n, 4608n, 90428n, 22806n]]);
},
MAX_TEST_EXECUTION_TIME,
);

it(
"batch_size>1",
async () => {
const inputs = tokenizer(["hello", "hello world"], { padding: true });
const outputs = await model.generate({
...inputs,
max_length: 10,
});
expect(outputs.tolist()).toEqual([
[100277n, 15339n, 50957n, 43410n, 77030n, 91444n, 99516n, 80720n, 4608n, 90428n],
[15339n, 1917n, 12095n, 21350n, 61586n, 19306n, 39486n, 91527n, 59768n, 31934n],
]);
},
MAX_TEST_EXECUTION_TIME,
);

afterAll(async () => {
await model?.dispose();
}, MAX_MODEL_DISPOSE_TIME);
});
});

describe("granite", () => {
describe("GraniteForCausalLM", () => {
const model_id = "hf-internal-testing/tiny-random-GraniteForCausalLM";
Expand Down

0 comments on commit c850083

Please sign in to comment.